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INTEGRATION AND SEGREGATION IN SPEECH PERCEPTION* 
Bruno H. Repp 



INTRODUCTION 

In this paper I present an overview of some recent research on speech perception. To reduce 
my task to manageable size, I have chosen to focus on the topics of perceptual integration and 
segregation, which have guided, more or less explicitly, a considerable amount of speech perception 
research and theorizing in recent years. This will be a selective review, therefore, but I hope it 
will nevertheless convey some of the flavor of contemporary ideas and findings, even though that 
flavor will be tinged with my own favorite spices. 

I. CONCEPTUAL FOUNDATIONS 

Integration and segregation are hypothetical perceptual functions (or processes) that link 
physical structures in the world with mental structures in the bnun. An integrative function 
maps multiple physical units (trivially, a single physical unit) onto a single mental unit, whereas 
a segregative function maps multiple physical units (sometimes, paradoxically, i single physical 
unit) onto different mental units. Though mutually exclusive for any particular physical structure 
at any given time, these two processes nevertheless cooperate in sorting a complex stream of 
sensory inputs into an orderly sequence of perceived objects and events. 

These definitions seem rather straightforward, but th»y rest on four important assumptions: 
(1) The physical and mental worlds are not isomorphic. (2) There are objectively definable units 
in the physical world. (3) There are units in the mental world that are different from the physical 
units. (4) There are perceptual functions or processes that accomplish the mapping between the 
two types of units. I will briefly defend each of these assumptions; at the end of this presentation, 
I will consider the consequences of abandoning some or all of them. 

The first assumption, thot the mental world is not isomorphic wit!* the physical world, reflects 
the facts that physical variables are filtered and transformed by sensory systems, that perception 
is a function not only of the current sensory input but also of the past history of the organism, 
and that there is often an element of choice in perception that permits alternative perceptual 
organizations for the same sensory input. Without this assumption, it would be difficult to s«ty 
anything meaningful about perception, except that it happens. 

To appear in Proceedings of the Eleventh International Conyrtss of Phontttc Sciences, Tallinn, 
Estonia, USSR (1987). 

Acknowledgment. Preparation of this paper wa& supported by NICHD Grant HD-01994 to Rask- 
ins Laboratories. I am grateful to Ignatius Maftingly. Susan Nitfiouei, and Michael Stucldort- 
Kennedy for helpful comments on an earlier veiMon. 
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The second assumption, concerning the existence of physical units, is necessary in order to be 
able to talk about perceptual integration: These units or dimensions are what is being integrated. 
Perceptual segregation, too, ordinarily implies that certain objective lines of division cau be found 
in the sensory input. It is always possible to find a physical description that is more finely grained 
than our description of the perceptual end product. The fact that the machines we use to assess 
physical characteristics of speech are mere transducers (or, at best, model only peripheral auditory 
processes) generally assures a mismatch between physical and perceptual descriptions even when 
the grain size is comparable (and even though our visual perception is engaged in interpreting 
the machine outputs). Although there are different ways of characterizing the physical energy 
pattern, they are all equally valid for descriptive purposes. It is an empirical question whether or 
not perceivers are sensitive to any observed physical divisions, that is, whether these divisions can 
serve as the basis for perceptual segregation or whether they are bridged by integrative processes. 
Research of this kind may enable us to find a physical description with a simpler mapping onto 
perceptual units. 

The third assumption concerns the existence and nature of perceptual (mentaP units. There 
is no theory of speech perception that does not assume mental units, usually the ones supplied 
by linguistic theory. The argument has been ov« r the "perceptual reality" of syllables, phonemes, 
and features, and over their relative privacy in perceptual procesbing (see, e.g., Jaeger, 1980; 
Lehiste, 1972; Massaro, 1975; McNeill L Uudig, 1973; Savin & Bever, 1970). However, which 
level of the linguistic hierarchy is perceptually and behaviorally salient depends very much on the 
task and the situation a perceiver is in. As McNeill and Lindig (1973, p. 430) have aptly put it, 
"what is perceptually real' is what one pay:, attention to." The validity of the basic linguistic 
categories, questions of detail aside, is pi'.aranteed by the success of linguistic analysis. Linguistic 
units piovide us with a vocabulary in which to describe the time course of accumulation and 
perceptual processing of linguistic information. Even though the pe*c<:ptual processes themselves 
may be of an analog nature, we need discrete concepts to theorize and communicate about these 
processes. From this perspective, it is not an empirical issue but a fact that perceivers process 
features, phonemes, syllables, words, etc., since they are what speech is made of. Their awareness 
of these categories is another matter that shall not concern us here. (See Mann, 1986; Mattingly, 
1972; Morais, Cary, / legria, & Bertelson, 1979.) Clearly, speech perception generally proceeds 
without awareness of ail but the highest levels of description (i.e., the meaning of the message). 

The fourth assumption is that there are perceptual processes in the brain that map sensory 
inputs onto internal structures. While such processes have been traditionally assumed in psychol- 
ogy since the demise of radical behaviorism, a new challenge (to the other assumptions as well) 
comes from the so-called direct realist school o' perception, which claims that perceptual systems 
merely "pick up" the information delivered by the senses (Fowler, 1986; Gibson, 1966). I will 
return to this issue later. Here I merely note that the same input is not always perceived in the 
same way. Contextual factors, past experience, expectations, and strategies may alter the per- 
certuol outcome, and this seems to require the assumption of perceptual processes that mediate 
between the input and the perceiver's interpretation of it. Whether these processes (and indeed, 
integration and segregation as such) are thought of as neural events with actual time and space 
coordinates or as abstract functional relationships between physical and mental descriptions is 
irrelevant to most of the research I will discuss here. 

Having attempted to justify the four principal assumptions, it remains for me to mention two 
issues that are important in much research on perceptual integration and segregation. One is the 
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question of whether the processes inferred are specific to the perception of speech or whether they 
represent general capacities of the auditory or cognitive system, By a speech-specific function I 
inea*. te that operates on properties that arc unique to speech. There is no question that general 
capacities to integrate and segregate are common to all perceptual and cognitive systems. Speech 
perception presumably results from a combination of general and speech-specific perceptual func- 
tions (see, e.g., Diehl, 1987), just as speech resembles other sounds in some respects and differs 
in others. One frequent research strategy, therefore, is to determine whether or not particular 
instances of integration or segregation can be observed in both speech and nonspeech perception. 
This question can be asked only if the physical characteristics of speech and nonspeech stimuli 
are comparable — a condition that is notoriously difficult to satisfy (see, e.g., Pisoni, 1987). The 
mental descriptions of speech and nonspeech are, by definition, different at some higher level; thus 
the empirical question is whether that ievel is engaged in a particular integrative or segregative 
process. 

The other issue is whether a particular integrative or segregative function is obligatory or 
optional. This question is sometimes linked with that of speech-specificity in that a higher- 
level, speech-specific function might seem easier to disengage than a lower-level auditory one. 
This is true in so far as adopting the deliberate strategy of listening to speech as if it were 
nonspeech (which is often difficult to achieve) may have the effect of eliminating certain forms of 
integration or segregation. It seeins to be difficult or impossible to disengage phonetic processes 
through conscious strategies within the speech mode (e.g., by linguistic parsing; Repp, 1985a, 
1985b). Moreover, it has been suggested (Liberman & Mattingly, 1985) that some speech-specific 
functions do not really represent e "higher" level of perception but rather a mode of operation 
in«i, '^fause of it<; Hological significance, takes precedence over nonspeech perception, and if 
so, these functions may indeed be difficult to manipulate. On the other hand, in the auditory 
(nonspeech) mode listeners often have a variety of perceptual strategies available, especially when 
there are few ecological constraints on the stimulation, even though certain functions of peripheral 
auditory processing are surely obligatory. Thus, although it is useful to gather information about 
the relative flexibility of a process, this may not bear directly on the question of speech-specificity, 
as both speech and nonspeech perception are likely to involve levels of varying rigidity. 

One final prefatory remark: Although one may legitimately talk about Jie integration of 
syllables into words and of words into sentences, or about the segregation of syntactic constituents 
from each other, I am not going to consider such higher linguistic processes in the present review. 
By speech perception 1 mean primarily the perception of phonetic structure without regard to 
lexical status or meaning, and my review is restricted accordingly. 

II. INTEGRATION 

The function of integrative pro^?sses is to pi oxide coheience among paits of the input that 
"belong together 1 ' according to some perceptual rule oi criterion. Auditory integiation occuis 
within the physical dimensions of time, (spectral) freqiienc}. and even space (in the case of 
aitificially split sources); thus it creates temporal, spectral, and spat lai coheieme of sound sources 
In part this is due to the Incited resolution of the auditon s\stei,i aloiifc each of these dimensions, 
but auditory events will often cohere even when there aie discriminabie changes within them. 
The larger these changes are, the more noteworthy the integrative process will seem to us The 
perception of phonetic structure involves, in addition, integration of relevant information across all 
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physical dimensions of the speech signal — a function requiring higher-level perceptual or cognitive 
mechanisms, 

A. Temporal Integration 

Basic processes of sensory integration and auditory organization ensure the temporal coher- 
ence of any relatively homogeneous auditory input, including components of speech. This form of 
integration is so obvious as to hardly deserve comment. Thus, for example, successive pitch peri- 
ods of a vowel are perceived as belonging together (i.e., as a single vowel, not two or many) even 
though their duration and spectral composition may change as a function of intonation, diph- 
thongization, and coarticulation. While there may be a physical basis for subdividing a sound 
into smaller units such as individual pitch pulses or transition versus steady state, the rate and 
extent of change from one unit to the next are too small to disrupt sensory integration. Never- 
theless, changes occurring within such units (e.g., transitions in a vowel or fricative noise) may 
have perceptual effects. That is, perception of temporal coherence does not imply insensitivity to 
changes over time, only that these changes are not large enough to cause perceptual segregation. 

L Growth of Loudness 

Temporal integration at this most elementary level has the consequence that, as the duration 
of a relatively homogeneous sound increases, its perceived loudness or perceptual prominence will 
also increase, up 10 a certain limit. In psychoacoustic research, the lowering of the d' tection 
threshold and the growth of loudness with increasing stimulus duration are well-established phe- 
nomena (see, e.g., Cowan, in press; Zwislocki, 1969). The time constant of the (exponential) 
integration function is about 200 ms, which encompasses the durations of virtually all relatively 
homogeneous speech events. While loudness judgments or explicit threshold measurements are 
uncommon in speech perception research, the effect cf an increase ir. the duration of a signal por- 
tion can be shown to be phonetically equivalent to that of an increase in its intensity, especially 
when the relevant signal portion is brief. 

One example is provided by studies in winch th** duration and relative intensity of aspiration 
noise were varied orthogonally as cues to the voicing distinction in synthetic syllable-initial English 
stop consonants (Darwin & Seton, 1983; Repp, 1979b). Although the trading function obtained 
was much steeper than the typical auditory temporal integration function, it bore some similarity 
to integration functions obtained in an auditory backward masking situation (Wright, 1964), 
which is not unreasonable in view of the following vowel. It seems likely that the observed 
time-intensity reciprocity rejects basic properties of the auditory system.* latlur than speech- 
specific processes. Indirect support for this hypothesis comes from a stud) showing that the 
trading relation between aspiration duration and intensity holds rcgaidless of whether or not 
listeners can rely on phonemic distinctions in discriminating speech stimuli (Repp, 1983b). In 
another recent study, stop cousonrnt release bui.st duration and intensity were varied in separate 
experiments as cues to stop consonant manner in s -.stop tlustei.s (Repp, 1984c). Siiue both 
parameters proved to be perceptually relevant, a trading i elation between thein was implied. An 
analogous conclusion may be drawn from an older informal study by Lisker (1918), in which the 
duration and intensity of stop closure voicing were varied as cues to the perceived voicing status 
of an intervocalic stop consonant. 
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I. Auditory Short-term Adaptation 

An effect closely related to temporal integration is that (he auditory nerve fibers responsive to 
a continuous sound become increasingly adapted. An 1itor> adaptation is a topic of great interest 
to psychoacousticians and auditory physiologists, who have identified at least three different time 
constants of adaptation in animals (see, e.g., Eggermont, 1985). So-calied auditory short-term 
adaptation, with a time constant of about 60 ms, seems the most relevant to phonetic perception. 
Although ongoing adaptation seems to have no direct perceptual consequences, the recovery of 
auditory nerve fibers foPowing the offset of a relatively homogeneous stimulus results in reduced 
sensitivity to other, spectrally similar inputs for a short time period. Consequently, the auditory 
representation o speech component whose spectrum overlaps that of a preceding segment will 
be modified. A striking demonstration of such an interaction was provided by Delgutte (1980; 
Delgutte & Kiang, 1984) in reco.'dings from cats' auditory nerves responding to syrMietic /ba/ 
and /ma/ syllables. Even though the two syllaMes were identical except for the nasal murmur 
in /ma/, the auditory response at • owei onset was very different. The murmur, having strong 
spectral components in the low-frequency range, effectively a. ted as a high-pass filter, reducing 
the neural response in the low-frequency region at vowel onset. Recent experiments suggest, 
however, that this particular auditory interaction has no important consequences for perception of 
nasal consonants under normal listening conditions (Repp, 1987a). In a more artificial situation, 
Sumnierfield, Haggard, Foster, rnd Gray (1984) and Su;iimerfield and Assniann (1987) have 
demonstrated an auditory aftereffect attributed to short-term adaptation; A sound with a uniform 
spectrum was perceived as a vowel when preceded by a sound whose spectrum was the complement 
of the perceived vowel's spectrum. Generalizing to natural speech, these authors pointed out that 
auditory adaptation effectively enhances spectral change and thus may aid phonetic peice t )tion 
in adverse listening conditions. 

One general lesson to be learned from psychoaconstic research on temporal integration, adap- 
tation, and other auditory interactions is that adjacent portions of the speech signal should not 
be thought of as mutually independent in the auditory system. Whenever a particular compo- 
nent is singled out for attention * :\ careful analytic listening (to the extent that this is possible), 
influence* of surrounding context on the perceived sound must be reckoned with. It is important 
to keep in mind, however, that listeners normally do not listen analytically but rather attend to 
the continuous pattern of speech. All peripheral auditory transformations are a natural part of 
the pattern and, because of past learning, are also represented in a listener's lon^ term memory 
of phonetic noi us, which provide the criteria for phonemic classification in a language. Since 
auditory input and central reference both incorporate the distortions imposed by the peripheral 
auditor} system, these distortions canno' be said to eithei help or hinder speech perception (see 
Repp, )987b). Only a change in auditory transfotmatioiis, as might be < a used b> simulated oi leal 
hearing impairment* would pro\e disturbing to listeners, in normal speech perception., peripheral 
auditory processes probably do not play a very important role. 

B. Spectral Integration 

Most speech sounds have complex spectra determined b\ the resonance frequencies of the 
\ocal tract. Fonnaiits are usually \isible as prominent energy bands ill a spectrogiam or as peaks 
in a spectral cross section. Why are these bands percehed as a single sound with a complex timbre 
and not as separate s.ainds with simpler qualities'* Win. indeed* are the individual harm nics of 
periodic speech sounds not heard as so ma iy simultaneous tones? Even thou&l. these question,- are 
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provoked by our instrumental and visual methods of spectral analysis, they are not unreasonable, 
since the ear operates essentially as a frequency analyzer. One answer to these questions is that 
we do process these spectral components, only we are not conscious of them and find it difficult 
to focus selectively on them when asked to do so. Multidimensional statistical analyses of vowel 
similarity judgments have confirmed that the lower formants function as perceptually relevant 
dimensions, even though they seem to blend into a complex auditory quality (e.g., Fox, 1983; Pols, 
van der Kamp, k Plomp, 1969; Rakerd k Verbrugge, 1985), and psychoacoustic pitch matching 
tasks have revealed that listeners can detect a number of lower harmonics in a complex periodic 
sound (e.g., Peters, Moore, k Glasberg, 1983; Plomp, 1964). Some central integrative function 
must be responsible for the perceptual coherence and unity of all these spectral components. 

U, Critical Bands 

Some spectral integration does take place in the peripheral auditory system. A large amount 
of psychoacoustic research has established the concept of critical bands, i.e., frequency regions 
over which spectral energy is integrated, and whose width increases with frequency in a roughly 
logarithmic fashion (Moore k Glasberg, 1983: Zwicker k Terhardt, 1980). It is now quite common 
to represent speech spectra on a critical-band frequency scale (the Bark scale) to better take 
account of the resolving power of the auditory system. However, critical bands cannot account 
for die fact that formants are integrated into a unitary percept, because the lower formants of 
speech are usually several critical bands apart, and thus potentially separable. Even the lower 
harmonics, especially of female and child speech, are spaced more than 1 Bark apart. Critical 
bands may explain why higher harmonics and higher formants are not well resolved auditorily, 
but these spectral components do not contribute much phonetic information. 

It. is difficult, therefore, to point to any direct consequences of critical band limitations for 
speech perception, except in hearing-impaired listeners, whose critical bandwidths are abnormally 
large. A recent study by Celmer and Bienvenue (1987) may serve as an example. These investi- 
gators digitized speech materials, degraded their spectra by simulating critical band integration 
ranging from one-half to seven times the normal widths, converted the manipulated spectra back 
into sound, and presented them to groups o* normal listeners and to hearing-impaired listeners 
believed to have abnormally wide critical bandwidths according to independent psychoacoustic 
tests. The results showed that the degree of critical bandwidth filtering required to cause an 
intelligibility decrement was directly related to the subjects' measured critical bandwidth. Thus, 
normal subjects were sensitive to filtering at twice the normal bandwidths, while hearing-impaired 
subjects, though their intelligibility scores were lower to begin with, tolerated up to five times 
the normal bandwidths before any de^ enient in intelligibility occurred. Many other studies, too 
numerous to review here, have examined correlations between measures of critical bandwidth (or 
frequency resolution) and measures of speech perception in hearing-impaired individuals, with 
mixed results (see, e.g., Drescliler k Plomp, 1980; Stelmachowicz, Jesteadt, Gorga, & Mott, 
1985). The looseness of the correlation may be accounted for by the facts that speech per- 
ception engages higher-level functions that help overcome peripheral limitations, often requires 
only relatively coarse spectral resolution, and relies on other physical parameters besides spectral 
structure. 
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2. Integration of Harmonics 

Given that the lower harmonics of a periodic speech sound are not automatically integrated 
by the peripheral auditory system, not to mention the lower formants themselves, the question 
of why they are grouped together in perception still needs to be answered. The most general 
answer is that they share a "common fate": They usually start and end at the same time; they 
are at integral multiples of the fundamental frequency; they have similar amplitude envelopes; 
and there is no alternative grouping that suggests itself. Below I will have more to say about 
the factors that may cause segregation of harmonics. Principles of auditory organization have 
received much attention in recent years (see, e.g., Bregmau, 1978; Darwin, 1981; Veintraub, 
1987), and one interesting conclusion from that research is that, even at such a relatively early 
stage in auditory processing, speech-specific criteria begin to play a role. They are speech-specific 
in the sense that a listener's tacit knowledge of what makes a good speech pattern influences 
the perceptual grouping of auditory components, as presumably does knowledge of other familiar 
auditory patterns. Yet another answer to the question of why harmonics (and formants) are 
grouped together is, therefore: They make a speech sound — that is, a complex sound that could 
possibly have emanated from a human vocal tract. 

If it is the case that formant frequencies are salient parameters of speech perception (an 
assumption that is not made by some researchers who favor a whole-spectrum approach; e.g., 
Bladon, 1982; Stevens & Blumstein, 1981), then it is of interest to ask how listeners estimate 
the actual resonance frequencies of the vocal tract from the energy distribution in the relevant 
spectral region. This question is especially pertinent with respect to the first formant (F x ) in 
periodic speech sounds, for which critical bands are narrow and frequency difference limens are 
small. This means that the actual F x frequency often falls between auditorily resolvable har- 
monics. Early work by Mushnikov and Chistovich (1973) suggested that the brain takes the 
frequency oi the single most intense harmonic as the estimate of F x . Later studies by Carlson, 
Fant, and Granstrom (1975) and Assmann and Nearey (1987), however, have indicated that the 
subjective Fi frequency corresponds to a weighted average of the two most intense harmonics, 
and Darwin and Gardner (1985) have shown that the perceptual boundary between /i/ and /e/ 
can be ffected by the intensity of as many as five harmonics between 250 and 750 Hz, spaced 
125 Hz apart. This indicates that the weighting function applied by the speech perception system 
in estimating formant frequencies extends over several critical bands (which are 100 Hz or less 
in this frequency region). The function is also asymmetric, giving more weight to higher than to 
lower harmonics, which may reflect a speech-specific constraint related to the fact that changes 
in actual F x frequency affect primarily the amplitudes of the higher harmonics in the vicinity of 
the spectral peak (Assmann & Nearey, 1987). Listeners thus seem to have tacit knowledge of the 
physical constraints on the shape of the ;'Ocal tract transfer function (Darwin, 1984). 

«?., Integration of Formants 

This leads us to tL more general question of whether the speech perception system integrates 
over adjacent formants (or any two peaks in the spectrum) when they are close in frequency but 
not within a critical band. It has been known for a long time that reasonable approximations to 
virtually all vowels can be achieved in synthesis with just two formants, and even with a single 
formant in the case of back vowels (Delattre, Libennan, Cooper, & Gerstman, 1952). Delattre et 
al. noted that the approximations were best when the two forinants replaced by a single formant 
were close in frequency (F } and F 2 in high back vowels; F 2 and F* in high front vowels), and 
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that the best single-formant substitute tended to be intermediate in frequency, suggesting that 
closely adjacent vowel formants form a perceptual composite or average. This ,dea was later 
elaborated by the Stockholm research group (Carlson, Granstrom, & Fant, 197C; Carlson et al., 
1975) into the concept of F-J, a hypothetical effective formant intermediate in frequency between 
F2 and Fz (except for /i/, where it falls between F 3 and F4). These authors developed a formula 
for calculating F[ from Fi, F>, F3, and F4, which gave good approximations to the results of 
perceptual matching experiments. 

More recently, Chistovich and her collaborators have conducted a number of experiments 011 
the "center ot gravity" effect — the demonstrable phonetic equivalence of a single formant to two 
adjacent formants of varying frequency and/or intensity (see Chistovich, 1985, for a review). One 
important question concerned the critical frequency separation of the two formants beyond which 
no satisfactory single-formant match could be achieved; it turned out to be about 3.5 Bark, that 
is, 3.5 critical bands (Chistovich & Lublinskaja, 1979). This finding has received considerable 
attention. For example, the 3.5 Bark limit has been related to the separation and boundaries 
between English vowel categories in acoustic space (Syrdal & Gopal, 1986), and it has been used, 
together with the center of gravity concept, to explain perceived shifts in the height of nasalized 
vowels, which often have two spectral prominences in the F\ region (Beddor, 1984). 

It is noteworthy, however, that already Delattre et al. (1952) were unable to achieve satisfac- 
tory single-formant matches to arbitrary two-formant patterns that did not correspond to familiar 
vowel categories. This finding, which was replicated by Traunmuller (1982, 1984b), suggests that 
spectral integration over 3.5 Bark is tied to the perception of phonetic (or phonemic) categories. 
Specifically, it may reflect the resolution of the auditory long-term memory in which phonemic ref- 
erence patterns are stored (Traunmuller, 1984b). Indeed, it is an open question whether the 3.5 
Bark limit explains the acoustic spacing of vonel categories (Syrdal & Gopal, 1986), or whether 
it is the other way around. A recent study by Schwartz and Escudier (1987), however, piovides 
evidence that the 3.5 Bark limit is not the consequence of phonemic categorization. Their data 
suggest that there is indeed a higher level of auditory representation that serves phonetic classi- 
fication and include s wide-band spectral integration. The cause of this integration is unknown at 
present. 

^. Redintegration of Artificially Separated Spectral Components 

Ultimately, it must be a higher-level process that decides whether a spectral array constitutes 
a single event or several. Integration over the whole spectrum is the natural state of affairs, since 
most natural sounds have complex spectra and could not easily be recognized if integration were 
not the default operation. Even an unrelated set of pure tones is perceived as a single complex 
structure when sounded simultaneously, as long as *io alternative organizations suggest themselves 
(e.g., Green, 1983; Kubovy, 1981). Such integration is disrupted by temporal or spatial separation 
of signal components, however; for example, the "auditory profiles" studied by Green and his 
coworkers are not well perceived when the sinusoidal components are divided between the two 
earphone channels (Green Sz Kidd, 1983). With familiar natural events such as speech, perceptual 
coherence of spectral components may be centrally guided and hence greater and more resistant 
to disruption. One possible example of this is the phenomenon called spectral- temporal fusion 
(Cutting, 1976) or duplex perception (Liberman, 1979), which has been studied extensively in 
recent years. 
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Piecursors of this research are found in experiments where the fonnants of synthetic syllables 
were separated and presented to opposite ears (e.g., F x to one ear and F 2 and F 3 to the other). 
It was found early on that this presentation gave rise to an intact speech percept, with little or 
no awareness of separate stimuli in the two ears (Broadbent & Ladefoged, 1957). Similar fusion 
of dichotic stimuli into a single perceived sound is observed with complete synthetic syllables in 
the two ears (e.g., Repp, 1976b) and even with harmonically related tones (e.g., Dtutsch, 1978). 
More surprising is the finding that perceptual integration continues to occur even when listeners 
are aware of separate stimuli in the two ears. Thus, Cutting (1976) presented the dichotically 
separated formants at different fundamental frequencies and observed that subjects still reported 
the percept corresponding to the combination of the formants. (For similar effects with diotic 
presentation, see Darwin, 1981.) In what is now called the duplex perception paradigm, Rand 
(1974) presented the formant transitions distinguishing two synthetic consonant-vowel syllables 
(such as /da/ and /ga/) to one ear and the remainder common to the two syllables (the "base") 
to the opposite ear. In this situation, listeners continue to report one or the other syllable 
depending on which formant transition is presented, even though that transition is also heard 
simultaneously as a lateralized nonspeech "chirp." The intact syllable (not the base) is heard in 
the ear receiving the base. Thus, subjectively at least, auditory fusion takes place despite the 
auditory segregation of the chirp — a paradoxical situation. This fusion continues to operate when 
the two signal components are presented at different fundamental frequencies (Cutting, 1976) or 
with slight temporal offsets (Repp & Bentin, 1984). A very similar phenomenon can be produced 
diotically by making the critical formant transition audible through temporal offset (Repp & 
Bentin, 1984), amplification (Waalen & Liberman, 1987), or different fundamental frequencies 
(informal observations). None of these manipulations, within certain limits, destroys the fused 
speech percept. 

One interpretation of these findings (see, e.g., Liberman & Mattingly, 1985) is that a special- 
ized speech "module" is responsible for the peiceptuJ integration and apparent fusion, whereas 
the general auditory system is responsible for the separate chirp percept. Bregman (1987), on the 
other hand, has proposed that the paradoxical co-occurrence of fusion and nonfusion arises from 
conflicting cues for integration and segregation in the general process of "auditory scene analysis." 
He and other students of auditory organization have stressed the relative independence of What 
and Where decisions in auditory perception (Bregman & Steiger, 1980; Darwin, 1981; Deutsch 
& Roll, 1976; Weintraub, 1987). It seems that auditory components that have been segregated 
can nevertheless be recombined in the perception and classification of familiar sound structures. 
That this recombination in the duplex perception paradigm is genuinely perceptual and not cog- 
nitive is indicated not only by the subjective impression of an intact syllable but by the fact that 
the components (chirp and base) presented by themselves generally do not suggest the "correct" 
phonetic percept (Repp, Milburn, & Ashkenas, 1983). 

C. Integration of Phonetic Information 

Speech consists of a sequence of diverse sound segments that, as everyone knows, do not 
correspond directly to linguistic units. Changes in spectral shuctuie a*e often very rapid and 
lead to great spectral heterogeneity over time. Equally striking is the alternation of qualitatively 
different sound types (periodic vs. aperiodic, as well as silence). Nevertheless, listeners perceive 
a coherent event, and thus believe speech to be a coherent stream of sounds. Since there is 
absolutely no reason to assume that very disparate sound structures are automatically integrated 
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by the auditory system, the subjective impression of auditory continuity must be due to higher- 
level articulatory and linguistic properties of cohesiveness that capture the listener's attention — a 
kind of categorical perception (see Repp, 1984a). 

How can our brain perform integrative feats in speech perception that exceed the capabilities 
of the auditory system? One possibility is that there exists a biological specialization in humans, 
a "speech module," which performs this task (see Fodor, 1983; Liberman & Mattingly, 1985). 
Alternatively, the answer may be mental prtcompilation as a consequence of perceptual learning - 
an assembled module, as it were (cf. Klatt, 1979). What distinguishes speech perception from the 
auditory perception of arbitrary tones and noises (but not necessarily from the perception of other 
ecologically significant auditory events) is that the input can be mapped onto meaningful units 
of various sizes. The integration of the auditory components relating to each unit represented 
in the perceiver's long-term memory has taken place long ago during the process of speech and 
language acquisition; it may be instantiated neurally as a flexible (context-sensitive) system of 
interconnections (Elman & McClelland, 1984; Klatt, 1979). These precompiled units then enable 
a perceiver to immediately relate a number of functionally independent auditory features to a 
common phonetic percept. Some interesting (and arduous) attempts to simulate this process of 
perceptual learning and unit formation in nonspeech auditory perception have been reviewed by 
Watson and Foyle (1985), who stress the importance of central processes in the identification 
and discrimination of complex stimuli. Experienced Morse code operators exhibit similar skills 
of "integrating" the acoustic dots and dashes into larger units (Bryan & Harter, 1899), and 
so do probably perceivers of other meaningful acoustic events in our environment (see Jenkins, 
1985; Warren & Verbrugge, 1984), although in none of these instances does the auditory stimulus 
structure recede as much from awareness as it does in speech perception. From this perspective, 
speech is unique not so much because it requires specialized perceptual and cognitive functions 
but because it is structurally different, having originated in the articulatory motor system. Our 
biological specialization may simply lie in the fact that we can mentally represent a system that 
complex. 

/. "Integrated" Auditory Properties 

The ability to integrate over dynamically changing sound patterns has occasionally been at- 
tributed to the auditory system. Thus, Stevens and Blumstein (1978, 1981; Blumstein & Stevens, 
1980) hypothesized that the onset spectrum following the release of stop consonants provides 
invariant acoustic correlates of place of articulation. Since there are often rapid spectral changes 
immediately following \he release, and since a spectrum cannot be computed instantaneously, 
the hypothetical auditory onset spectrum must derive from an integrative process. Stevens and 
Blumstein hypothesized that the human auditory system integrates over about 25 ms and thus 
extracts the acoustic property relevant to place of articulation. 

The work of Stevens and Blumstein has come under criticism in recent years. Kewley-Port 
(1983) has argued that, for all we know, the auditory system trpcks spectral changes over time 
intervals shorter than 25 ms and presumably delivers information about these changes to phonetic 
decision mechanisms. A perceptual study by Kewley-Port, Pisoni, and Studdert-Kennedy (1983) 
has suggested that listeners are indeed sensitive to spectral changes immediately following the 
release of stop consonants (see also Blumstein L Stevens, 1980). The onset spectra themselves do 
not appear to be as invariant as was originally claimed (see Lahiri, Gewirth, Sz Blumstein, 1984; 
Suomi, 1985). Blumstein and her students meanwhile have abandoned the search for invariant 
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properties in onset spectra and have instead gone on to define integrated properties based on 
the relationship between spectra or intensity measures obtained some interval apart (Jongman, 
Blumstein, & Lahiri, 1985; Kurowski & Blumstein, in press; Lahiri et al., 1984). Even though 
some of these properties are quite complex, their derivation is still attributed to the auditory 
system by these researchers. However, since it seems highly implausible that there are general 
auditory functions that yield so specialized a result, the epithet "auditory" should perhaps be 
understood as referring merely to the input modality. Clearly, out of the infinity of possibilities, 
particular relational properties are selected on the basis of phonetic relevance. The integrative 
computational procers thus is specific to speech perception. 

2. Integration of Silence and Other Signal Components 

Even though it seems unlikely that the auditory system integrates over spectral variation in 
the speech signal lasting tens cf milliseconds, this hypothesis has some measure of plausibility, 
given the basic continuity of the signal changes. There are many more abrupt changes in the 
speech signal, however, such as changes in source (from voiced to voiceless, or vice versa), in 
spectrum (such as /z/ followed by /u/), and in intensity (into and out of closures filled with nasal 
murmur, voicing, or silence), usually in several of these dimensions simultaneously. It would 
seem absurd to attribute to the auditory system the capability to integrate across such dramatic 
signal changes, since the task of auditory perception is to detect changes, not to conceal them. 
Nevertheless, there is ample evidence from perceptual experiments that listeners can integrate 
phonetic information across such acoustic discontinuities in the signal. Clearly, this integration 
must be a higher-level function in the service of speech perception. 

Perhaps the most striking instance is the perception of silence in speech. (I have in mind 
brief silent intervals of up to 200 ms duration, not longer pauses.) From an auditory perspective, 
silence is the absence of energy, a gap, an interruption that separates the signal portions to be 
perceived. In speech perception, however, silence is bridged by, and participates in, integrative 
pre ?sses. Rather than being the neutral backdrop for the theater of auditory events, silence is 
ink nationally equivalent to energy-carrying signal portions. Relative duration of silence has 
been shev/n to be a cue for the perception of stop consonant voicing (Kohler, 1979; Lisker, 
1957; Port, 1979), manner (Bailey & Summerfield, 1980; Repp, 1984c; Repp, Liberman, Eccardt, 
& Pesetsky, 1978), and place of articulation (Bailey & Summerfield, 1980; Port, 1979; Repp, 
1984b). Why does silence function in this way in speech? The answer must be that it is an 
integral part of the acoustic patterns that a human listener has learned to recognize. Being an 
acoustic consequence of the oral closure connected with (voiceless) stop consonants, it has become 
a defining characteristic of that manner class. Lawful variations in its duration as a function of 
voicing status or place of articulation also have assumed the function of perceptual "cues." A 
listener's long-term representation of the acoustic pattern corresponding Jo a stop consonant 
thus includes the spectro-temporal properties of the signals preceding and following the closure 
as well as the closure itself. (The precise nature of that mental representation, or rather of our 
description of it, need not concern us here; it suffices to note that listeners behave as if they knew 
what acoustic pattern to expect.) The silence thus is not really "actively" integrated with the 
surrounding signal portions; rather, the integration has already taken place during past perceptual 
learning and is embodied in the perceiver's long-term knowledge of speech patterns to which the 
input is referred during perception. 
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Not only is silence integrated (in the sense just discussed) with surrounding signal portions in 
phonetic perception, but acoustically rather different components of the signal are integrated with 
each other. Thus, for example, the spectrum of a fricative noise and the adjacent vocalic formant 
transitions both contribute to perception of a prevocalic fricative consonant (e.g., Mann & Repp, 
1980; Whalen, 1981), the formant transitions in and out of a closure contribute to stop consonant 
perception (Tartter, Kat, Samuel, & Repp, 1983), etc. Just as articulation distributes acoustic 
information about individual phonemes over time, perceptual integrative functions collect that 
information and relate it to internal criteria for linguistic category membership. An especially 
interesting demonstration of this was prodded quite recently by Tomiak, Mullennix, and Sawusch 
(1987). Using a well-known technique (Garner, 1974) for testing listeners' ability to selectively 
attend to stimulus dimensions, they showed that the "fricative noise" and "vowel" portions of 
noise-tone analogs to fricative- vowel syllables were processed separately by subjects who perceived 
the stimuli as nonspeech sounds, but were processed integrally by subjects who had been told 
that the stimuli represented syllables. These latter subjects were unable to selectively attend to 
either of the two stimulus portions, even though coarticulatory interactions were not present in 
the noise-tone stimuli. Listeners in the "speech mode" thus seem to process auditory components 
of speech in an integrative manner even if some of the information to be integrated is not actually 
there; they are scanning for it, as it were. 

Independent aspects of the speech signal that contribute to the same phonemic decision 
combine according to a simple decision rule, as demonstrated in many experiments by Massaro 
(e.g., Derr & Massaro, 1980; Massaro & Oden, 1980). It is possible to trade various of these cues, 
changing the physical parameters of one while changing those of another in the opposite direction, 
without altering the phonemic percept. This phenomenon, often referred to as "phonetic trading 
relations," has been demonstrated in a large number of studies ^ see review by Repp, 1982). Fitch, 
Halwes, Ericsson, and Liberman (1980) showed that listeners have great difficulty discriminat- 
ing two phonemically equivalent stimuli created by playing off two cues against each other, and 
they argued that this reflects the operation of a special phonetic process that makes auditory 
differences unavailable to perception. Whether the process of phonetic information integration 
is speech-specific is debatable (cf. Repp, 1987b), even though it is agreed that the information 
being integrated is speech-specific. Listeners' difficulty in discriminating phonemically equivalent 
stimuli is familiar from classical categorical perception research (see review by Repp, 1984a). Ex- 
periments on phonetic trading relations that include identification and discrimination tests (Best, 
Morrongiello, & Robson, 1981; Fitch et al., 1980) are generalized categorical perception tasks, in 
which several physical parameters are varied simultaneously. If each parameter variation by itself 
is difficult to discriminate except when it cues a category distinction, then joint variations in these 
parameters will be almost as difficult to discriminate unless a phonemic contrast is perceived. This 
does not mean, however, that auditory discrimination of such variations is impossible. Appro- 
priate training and use of low- uncertainty discrimination paradigms has been shown to reduce or 
eliminate categorical perception of single dimensions (Carney, Widin, & Viemeister, 1977; Repp, 
1981), and it is likely that similar training would enable subjects to discrin inate simultaneous 
variations in several cues, thus demonstrating that their integration does not take place in the 
auditory system (see also Best et al., 1981 j. There is also evidence that certain phonetic trading 
relations occur only when listeners ean make phonemic distinctions, but not within phonemic 
categories (Repp, 1983b). 
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In summary, the various forms of phonetic cue integration seem to represent, for the most 
part, speech-specific functions in so far as the articulatory processes and the corresponding lin- 
guistic categories that cause the integration are specific to speech. This idea is embodied in 
Massaro's "fuzzy logical model" of phonetic decision making (Massaro & Oden, 1980), which as- 
sumes that, for each phonemic category, listeners have internal criteria for the degree of presence 
of various acoustic features in the speech signal. Diehl and his colleagues have -ecently argued 
that many trading relations may have a general auditory basis (Diehl, 1987; Parker, Diehl, & 
Kluender, 1986). While their research may show that some trading relations (especially those 
within a physical dimension) indeed rest on auditory interactions, this is unlikely to be true for 
the many trading relations that cut across physical dimensions. Although phonetic perception is 
certainly not immune to auditory interactions, cue integration appears to be mainly * function 
of speech-specific classification criteria. 

5. Phonetic Context Effects 

Perceivers not only integrate cues directly pertaining to a particular phoneme or complex 
of articulatory gestures, but they adapt their perceptual criteria to the surrounding phonetic 
context. Examples of such phonetic context effects are the slrft in the /s/-/ / / category boundary 
depending on the following vowel (Kunisaki k Fujisaki, 1977; Mann k Repp, 1980) and the shift 
in the /b/-/p/ voice-onset- time category boundary depending on the speaking rate or duration of 
the surrounding segments (Green k Miller, 1985; Miller, 1981; Summerfield, 1981). For reviews, 
see Miller (1981), Repp (1982), and Repp and Liberman (1987). As in the case of phonetic 
trading relations, some of these effects may have general auditory processing explanations; thus, 
for example, the effect oi vowel duration on perception of the /ba/-/wa/ distinction (Miller k 
Liberman, 1979) probably is not speech-specific, as a comparable effect has also been obtained 
with nonspeech stimuli (Pisoni, Carrell, k Gans, 1983). Many other effects, however, seem 
to reflect listeners' tacit knowledge of coarticulatory dependencies in speech production. For 
example, the different /s/-/// boundaries in the context of rounded and unrounded vowels may 
be related to the occurrence of anticipatory liprounding during the constriction phase in utterances 
such as "soup" but not in "sap." In a series of experiments using cross-spiked fricative noises 
and vowels, Whalen (1984; Whalen k Samuel, 1985) has shown that even when the fricative noise 
itself is quite unambiguous, subjects' reaction time in a fricative identification task is influenced 
by the following vocalic context, being slower when the fricative noise spectrum is not exactly 
what would be expected in that context (cf. the study by Tomiak et al., 1987, reviewed above). 
In an unpublished series of experiments, Repp (1978a) demonstrated an effect he dubbed "co- 
perception," which consisted of slower reaction times to decide that the two consonants are the 
same in the stimulus pair /aba/-/abi/ than in the pair /aba/-/aba/, even though the pre-closure 
(VC) portions of these synthetic VCV stimuli were identical in both cases. That is, even tho <gh 
subjects could have made their decisions after hearing /ab/ in the second member of a stimuli*? 
pair, they somehow had to take the CV portions of the stimuli into account and then we.e 
slowed down by the inequality of the vowels. All these studies show that perceivers integrate 
all information that possibly could bear on phonetic decisions, and this integration often seems 
obligatory in nature. It requires special instructions, special (nonphonetic) tasks, and usually 
some amount of training to disengage phonetic integration mechanisms in the laboratory (e.g., 
Best et al., 1981; Repp, 1980, 1981, 1985b). 
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4* Cross-modal Integration 

In natural speech communication, humans make use not only of auditory but also of visual 
information, if available. Audiovisual integration at the level of phoneme perception has been a 
research topic of considerable interest since the discovery by McGurk and MacDonald (1976) that 
subjects presented with certain conflicting auditory and visual speech stimuli report that they 
"hear" what they see. Their findings have been replicated and extended in a number of studies 
(MacDonald & McGurk, 1978; Massaro & Cohen, 1983; Summerfield, 1981; and others). Massaro 
(in press; Massaro & Cohen, 1983) has showu that a general rule of information integration based 
on the degree to which signal features match expected feature values can explain audiovisual 
integration, auditory cue integration, as well as many other forms of perceptual integration outside 
the domain of speech. This suggests that we may be dealing with a general function following basic 
laws of decision theory. Liberman and his collaborators (Liberman, 1982; Repp et al., 1978), on 
the other hand, have argued that integration of speech cues, within or across modalities, occurs 
because they represent the multiple, distributed consequences of articulatory acts or gestures. 
Some internal reference to processes of speech production is thus implied, as in the "motor theory 
of speech perception (see Liberman & Mattingly, 1985). However, this account is complementary 
rather than antithetic to Massaro's model: It is a theory of why integration occurs, whereas 
Massaro is concerned with how integration works. The phonemes of a language are articulatory 
events that have characteristic acoustic and optic consequences, and perceivers presumably have 
tacit knowledge incorporating both of these aspects. If a portion of the speech input satisfies 
certain auditory and visual criteria for phonemic category membership (as in Massaro's model) 
this also implies that the gestures characterizing a particular phoneme have been recovered (as 
in the motor theory). Whether the sensory or the articulatory aspect is stressed in a particular 
theory is largely a matter of philosophy and perhaps of economy. A complete theory must include 
both. 

Audiovisual integration at the more global level of word, sentence, and discourse compre- 
hension has, of course, been of interest for a long time in connection with hearing impaiin?ent 
and communication in noisy environments. Research on this topic has received a boost in recent 
years with the advent of modern signal processing technology and of cochlear implants. (See 
Summerfield, 1983, for a review.) The information provided by residual hearing or by electrical 
stimulation of the auditory nerve supplements that obtained from lipreading to yield enhanced 
comprehension. In many respects, these two sources of information are complementary, with the 
auditory channel providing information that is difficult to see, and vice versa. What is of special 
interest in the present context is that audiovisual comprehension performance often skeins to 
exceed what might be expected from a mere combination of independent sources of information. 
Thus, Rosen, Fourcin, and Moore (1981) demonstrated that speech intelligibility is improved 
substantially when lipreading in hearing subjects is supplemented with the audible fundamen- 
tal frequency contour, or even just with a constant buzz representing tl. occurrence of voicing. 
(See also Breeuwer & Plomp, 1986, Grant, Ardell, Kahl, & Sparks, 1985) Since this aud'.ory 
component by itself provides virtually no information about phonetic stiucture, it must be the 
temporal relationships between the auditory and visual channels that contribute to intelligibility 
(McGrath & Summerfield, 1985). Thus audiovisual speech perception is often more than the sum 
of itb parts; in terms of Massaro's (in press) model, the separate sources are integrated before 
central evaluation. The close integration of inputs from the two modalities is witnessed by anec- 
dotal reports that voicing-triggered buzz accompanying lipreading may assume phonetic qualities 
(Summerfield, in press). 
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The theoretical issues raised by audiovisual irtegration have been discussed thoroughly by 
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a><» integrated before any categorical decisions <ue made. There are four ways of conceptualizing 
how this integration occurs: (1) The two channels make independent contributions to linguistic 
decisions, but temporal relationships : \>vidc a third source of information. (2) The visual in- 
formation is translated into an auditory metric of vocal tract area functions. (3) The auditory 
information is translated into a visual metric of articulatory kinematics. (4) Both are translated 
into an abstract representation of dynamic control parameters of articulation. This last-mentioned 
approach (e.g., Browman & Goldstein, 1986; Kelso, Saltzman, & Tuller, 1986) may ultimately 
provide the most economic description of speech information in both modalities, and thus may 
yield the most appropriate vocabulary in which to describe intermodal integration. 

5. Higher-level Integration 

Human listeners not only integrate auditory and visual information about a speaker's articu- 
lations, but thej also bring phonotactic, lexical, syntactic, semantic, and pragmatic expectations 
to bear on their linguistic decisions, provided the auditory and/or visual input is sufficiently am- 
biguous to give room to effects of such expectations. Some well-known demonstrations of effects 
in this category are the "phoneme restoration" phenomenon discovered by Warren (1970) and 
studied more recently by Samuel (1981), in which lexical expectations fill in missing acoustic 
information, as it were; the lexical bias effect reported by Ganong (1980) and replicated by Fox 
(1984), which causes a relative shift in the category boundaries on acoustic word-nonword (e.g., 
DASH-TASH versus DASF-TASK) continua in tavor of word percepts; and the "fluent restora- 
tions" in rapid shadowing of semantically anomalous passages (Marslen-Wilson, 1985). These 
phenomena, and a host of related ones often refericd to as "top-down" effects, may be consid- 
ered general forms of cognitive information integration in speech perception. Indeed, Massaro 
(in press) has argued that the rules by which such higher-level information is integrated with 
the "bottom-up" information delivered by the senses are the same by which acoustic (and optic) 
speech cues are integrated. Others argue that top-down influences should be strictly separated 
from bottom-up processes— that they represent general cognitive functions that operate outside 
the autonomous speech module (Fodor, 198i, Liberman & Mattingly, 1985). According to this 
second view, integration of boiiorr-up cues to phoneme identity is a fundamentally different pro- 
cess from the integration of bottom-up and top-down information. My own view in this matter 
is that speech perception at every level requires domain-specific knowledge stored in a perceiver's 
long-term memory. "Hie processes by which this knowledge is brought to bear upon the sensory 
input are part of our inetaphoric representation of brain function and thus are bound to be gen- 
eral (cf. Repp, 1987b). In the absence of a radically diffemt vocabulary in which to characterize 
the processes within a module (though such a vocabulary will perhaps emerge from the study 
of articulatory dynamics and coordination), the postulate of a speech module harks back to the 
"black box" of behaviorism. It is quite likely, of course, that phonetic perception is modular in 
the sen • that integration of phonetic cues precedes, and is not directly influenced by, higher-level 
factors. This issue can be addressed empirically (sec. e.g.. Fodor, 19<S3; Ganong, 1980; Samuel, 
1981; Swinney, 1982). My point here is that integration, whether it occurs inside a module or 
outsite it, is conceptually the same thing: a many-to-one mapping. Indeed, Massaro 's (e.g., in 
press) extensive research suggests that the rules of information integration are independent of 
modularity. 
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III. SEGREGATION 

The preceding section has illustrated the pervasiveness of integrative processes in speech 
perception. Much of perceptual and cognitive processing is convergent, with multiple sources 
of information contributing to single decisions, be they explicit or implicit. Nevertheless, we 
also need hypothetical mechanisms to prevent all information from converging onto every deci- 
sion "node." Even though a perceiver's internal criteria for linguistic category membership will 
automatically reject irrelevant information, information that does not belong is nevertheless of- 
ten potentially relevant. Thus, in the often-cited cocktail party situation, the voices of several 
speakers must be kept apart to avoid semantic and phonetic confusions. Various environmental 
sounds could simulate phonetic events and need to be segregated from the true speech stream. 
In the speech signal itself, information pertaining to speaker identity, emotion, room acoustics, 
rtc, needs to be distinguished from the phonetic structure, and the overlapping consequences 
of segmental articulation need to be sorted out. These segregative processes have an important 
complementary role to play in speech perception: They ensure that integration is restricted to 
those pieces of information that belong together. Logically, segregation precedes integration, even 
though functionally they may be just the two sides of one coin. The more physically similar and 
intertwined the aspects to be segregated are, the more remarkable the segregative process will 
seem to us. 

A. Temporal and Spatial Segregation 

Without any doubt, there are several factors that enable perceivers to distinguish different 
sound sources or events, regardless of whether they are speech or not. One of these is temporal 
separation. Sounds occurring a long time apart will usually not be considered as belonging to the 
same event, although they may come from the same source. In speech, a few seconds are usually 
enough to segregate phrases or utterances, and a few hundreds of milliseconds of separation usually 
prevent integration of acoustic cues into a single phonemic decision. One demonstration of this 
fact may be found in studies cf the distinction between single and geminate stop consonants. In 
a classic experiment, Pickett and Dicker (1960) asked English-speaking subjects to distinguish 
between utterances such a«, "topic" and "top pick," varying only the duration of the silent /p/ 
closure. Between 150 ai*d 300 ms were needed to obtain judgments of two /p/s (and two words) 
rather than just one; the precise duration depended on the overall speaking rate. (See also 
Obrecht, 1965; Repp, 1978b; 1979a.) If two different stop consonants follow each other, as in the 
nonsense word /abda/, about 100 ms of silent closure are needed to prevent integration of the 
two sets of formant transitions into a single stop consonant percept (e.g., Dorman, Raphael, & 
Libennan, 1979; Repp, 1978b). Dorman et al. (1979) cued the perception of /p/ in "split" solely 
by inserting a silent interval between an /s/ noise and the syllable "lit" (a percept that may be 
said to be a pure temporal integration illusion), and subsequently investigated how much silence 
was needed before subjects reported hearing "s" followed by "lit." This duration turned out to be 
as long as 600 ms. A subsequent replication (Repp, 1985b) obtained a shorter but still surprisingly 
long interval of 300-400 ms. To cite a final example, Tillmann, Pompino-Marschall, and Porzig 
(1984) investigated how much temporal offset of optically and acoustically presented syllables 
W8s needed to destroy the audiovisual integration effect discovered by McGurk and MacDonald 
(1976). It turned out to be 250-300 ms. These various situations have little in common, which 
explains the different results. The precise duration of the critical interval for segregation surely 
depends on many factors and does not reflect any general limits of temporal integration. Rather, 
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within the auditory modality it may be related to the closure durations normally encountered in 
natural speech (see, e.g., Pickett & Decker, I960; Repp, 1983a). 

Temporal asynchrony is a helpful cue in distinguishing speech from other environmental 
sounds. This was elegantly demonstrated in a series of studies by Darwin (1984; Darwin & 
Sutherland, 1984), who investigated under what conditions a pure tone added to one of the 
(pure-tone) harmonics of a synthetic vowel was treated by listeners as prrt of the vowel spectium 
or as a separate nonspeech event. Darwin showed that, when the tone coincided with the vowel, 
it affected the perceived vowel quality. However, when the onset of the tone preceded that of the 
vowel or, o a lesser extent, when its offset lagged behind that of the vowel, listeners excluded it 
from the phonetic information. Similar principles of segregation or "auditory stream formation" 
have been demonstrated in the perception of nonspeech sounds by Bregman and Pinker (1978). 

Another factor that may cause segregation is spatial separation. In real life, the separation 
of several simultaneous voices or of speech from background noises is often possible because they 
are perceived as coming from different locations. In the laboratory, presentation over the two 
channels of earphones has been used to induce segregation. One interesting case in which th.s form 
of spatial separation does not seem to prevent integration is split-formant or duplex perception, 
discussed above. Note, however^ that in duplex perception one component of the speech signal 
(the "chirp") is segregated and heard as a separate auditory event; the paradox is that this 
event is still, at the same time, integrated with the speech in the other ear. (See Bregman, 1987.) 
There are many other instances, however, particularly those in which there is no temporal overlap 
between the two signals, where spatial separation is sufficient to disrupt perceptual integration. 
For example, informal observations suggest that, if the artificial "split" created by concatenating 
"s" and "lit" with some intervening silence is divided between the two ears, so that "s" occurs 
in one ear and "lit" in the other, this is exactly what listeners report hearing; that is, there 
is no /p/ percept any more. Similarly, when nasal-consonant-vowel syllables such as /mi/ or 
/ni/ are divided between the two ears, so that the nasal murmur occurs in one and the vocalic 
portion containing the formant transitions in the other, listeners have great difficulty identifying 
the consonant, or in any case do not perform better than if the two components were presented 
by themselves (Repp, 1987a). Of course, it is always possible to integrate independent sources of 
information at a cognitive level. These two examples illustrate the role of spatial separation as 
a segregating factor. Unfortunately, in real life both temporal and spatial separation are often 
unavailable as segregating agents, and listeners need additional means of sorting out the incoming 
stream of auditory information. 

B. Spectral Segregation 

When irrelevant (speech or nonspeech) sounds are superimposed on speech, listeners have 
basically two means of segregation at their disposal: Segregation according to local spectral 
disparity, and according to spectro-temporal (and. in part, speech-specific) criteria of pattern 
coherence. There are, of course, many sounds in the environment . including those produced by 
most musical instruments, that are sufficient!} different from speech to be perceived immediately 
as different sources. Local spectral segregation is not always effective, however, and for good 
reason: First, some nonspeech events (e.g., the pops of bottles or the hisses of steam valves) are 
spectrally similar to soeech sounds and thus are difficult to separate from them locally. Second, 
and more importantly, speech itself is composed of acoustic segments of diverse spectral com- 
position, and it would be counterproductive if listeners were prone to segregate thein, because 
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these segments more often than not map onto the same linguistic unit. Indeed, perceptual seg- 
regation of spectrally dissimilar natural speech components can usually be demonstrated only 
under special conditions, which rarely occur outside the laboratory. Thus, Cole and Scott (1973) 
rapidly iterated fricative- vowel syllables aud found that listeners sometimes reported two streams 
of events: a t rain of fricative noises, and a train of vowels, especially when the vocalic formant 
transitions were removed. A similar phenomenon was obtained with the repeated syllable /ska/ 
by Diehl, Kluender, and Parker (1985), who then used their findings to explain the different effects 
of /spa/ or /ska/ stimuli as adaptors (u* precursors) in selective adaptation and pairwise contrast 
paradigms (Sawusch & Jusczyk, 1981; Sawusch & Nusbaum, 1983). The selective adaptation task 
requires cyclic repetition of a single stimulus, the adaptor, and thus may produce "streaming" of 
signal components, so that /spa/ is heard as /s/ and /ba/, with the phonological status of the stop 
consonant altered. Repp (1981) was able to induce iisteners through some training to segregate 
a fricative noise from a following vowel and "hear out" the spectral quality of the noise. Even 
the individual formants of vowels may segregate under certain conditions. Thomas, Hill, Carrol, 
and Garcia (1970) and Warren and Warren (1970) observed that it was difficult to perceive the 
correct temporal order of four rapidly cycling steady-state vowels, and Dorman, Cutting, and 
Raphael (1975) found that this was because in such artificial sequences individual formants tend 
to group together and form separate auditory streams. There are anecdotal reports of phoneti- 
cians being able to "hear out" individual formants of voweli (e.g., Halle, Hughes, & Radley, 1957; 
Schubert, 1982), but this ability has remained rare. Still, these various findings underline the fact 
that spectrally diverse components of the speech signal are potentially segregable; fortunately, 
however, they are perceptually integrated under normal circumstances. 

When two different speech streams co-occur, differences in fundamental frequency, intonation 
pattern, or voice quality may provide cues for separation, in addition to higher-level factors 
such as syntactic and semantic continuity. Effects of this kind have been found in classical 
work on selective attention reviewed by Treisman (1969). More recently, Brokx and Nooteboom 
(1982) obtained a beneficial effect of differences in fundamental frequency and intonation on 
the identification of meaningless sentences presented against a LnHf^round of a read story. In 
the much more artificial situation of two simultaneous steady-state vowels, Scheffers (1983) and 
Zwicker (1984) found an improvement in recognition performance when a fundamental frequency 
difference was introduced. 3ince the magnitude of the difference beyond one semitone did not 
seem to play a role, the function of Fq differences in this case seems to be to prevent fusion of the 
two sounds. Similar, though small, effects of Fo on identification scores have also been obtained 
in dichotic listening studies using synthetic syllables (Halwes, 1969; Repp, 1976a; Tartter & 
Blumstein, 1981) or vowels (Zwicker, 1984). 

The potential of fundamental frequency (F 0 ) and voice quality cues to segregate succc3aiv< 
portions of speech has also been demonstrated in the laboratory. The mechanisms studied here 
must be involved in separating different speakers from each other. Seveial relevant studies have 
used stimuli in which perception of a stop consonant rested on the duration of a silent closure 
interval. Dorman et al. (1979) found that when the speech on each side of the silence was 
produced by different voices, the silence lost its perceptual effectiveness; that is, listeners did not 
integrate across it. On the other hand, Rakerd, Dechovitz, and Verbrugge (1982) and Verbrugge 
and Rakerd (1986) have shown tl.:it silence retains its effectiveness between syllables produced by 
male and female voices if the general articulatory and intonation*! pattern is continuous across 
the two speakers (achieved by cross-splicing two intact utterances). When the second s>llable was 
spiiced onto a first syllable originally produced in utterance-final position, however, the phonetic 
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effect of the silenc* was disrupted. Thus it seems that dynamic spectro-teinpcral information 
about articulatory continuity can override differences in F 0 or voice quality. A disruptive effect 
of discontinuities in intonation on stop consonant perception has also been reported by Price 
and Levitt (1983), but such an effect was absent in a recent study (Repp, 1985a) in which ? 
constant fricative noise preceded the critical silence, suggesting that the break* in the F c contour 
are effective only when voiced signal portions immediately abut the silent closure interval. 

C. Segregation of Linguistic and Paralinguistic Information 

So far I have discussed segregation of two kinds: One separates speech from other, irrele- 
vant sounds (including competing speech streams), and the other dissociates consecutive parts 
of the same speech stream — a laboratory-induced phenomenon to be avoided in natural speech 
communication. Tiiese segregative processes are "literal" in that they result in the perception 
of separate sound sources. Segregative processes are also essential, however, when listening to 
a single speech source, and for two reasons. First, the speech signal conveys in parallel, and 
largely over the same time-frequency channels, information about phonetic composition, speaker 
characteristics (vocal tract size, sex, age, identity, emotion), and room or transmission char- 
acteristics (reverberation, distortion, filtering). A listener needs to separate these three kinds 
of information, which Chistovich (1985) has termed "phonetic qualify," personal quality," and 
"transmission quality," respectively. (See also Traunmuller, 1987.) Second, the acoustic informa- 
tion for adjacent phonemes is overlapped and merged, a phenomenon commonly referred in as 
coarticulation or "encoding." If phonemic units are to be recovered, the information pertaining 
to one phoneme needs to be separated from that for another — or :> it seems. Both these kinds of 
segregation are not literal in the sense that they make a speech stream disintegrate perceptually; 
rather, they separate different aspects of a coherent perceptual event by relating these aspects to 
different conceptual categories or dimensions represented in long-term memory. They operate on 
the information in the signal, not on the signal itself. 

Of the various types of information segregation of the first kind, that of separating vocal tract 
size information from phonetic information has received the most attention under the heading of 
speaker normalization. An explicit solution to this problem is of vital importance to automatic 
speech recognition as well as to any theory of speech perception. In fact, the focus has been 
so exclusively on the speaker-independent recovery of phonetic information that it is sometimes 
forgotten that listeners extract several kinds of information in parallel. Rather than "normalizing" 
their internal representation of the speech wave and discarding information in the process, they 
presumably use all available kinds of information to mutual advantage. 

Studies of speaker normalization have, for the most part, been concerned with vowels rather 
than consonants, and with acoustic analysis and automatic recognition rather than with human 
perception. Older normalization algorithms often required knowledge of a speaker's whole vowel 
space or average formant frequencies (see Disner, 1980), whereas more recent work has focused on 
perceptually more relevant transformations based on parameters that are immediately available m 
the incoming speech signal (e.g., Suonii, 19*4: Syrda! k Copal, 1986; Traunmuller, I9ft4a). There 
have been relatively few perceptual studies on this topic, t! e general assumption has been that it is 
sufficient to define acoustic properties that are relatively speaker-invariant and also plausible in the 
light of what is known about the auditory system. Demonstrations of "perceptual normalization" 
usually s ! *ow a performance decrement in a listening situation where speaker characteristics are 
varied rapidly and unpredictably, compared to one in which the speaker remains constant (e.g., 
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Ladefoged & Broad^ent, 1957; Summerfield & Haggard, 1975; Verbrugge, Strange, Shankweiler, 
& Edinan, 1976). Although emphasis is sometimes placed on the perceptual "advantage" resulting 
from effective normalization, the negative consequences of presenting contrived and misleading 
stimuli are perhaps the more salient outcome of this research (which is by no means unique in 
this respect). 

Analogous experiments have been conducted on normalization in the temporal domain — that 
is, on the perceptual separation of speaking rate from phonetic length (see review by Miller, 1981). 
An especially interesting question arises in research on tone languages, where the listener must 
segregate lexical tones from the overall intonation contour (e.g., Connell, Hogan, & Rozsypal, 
J 983) and from speaker-dependent variation in Fo (Leather, 1983). In that connection, :t is 
noteworthy that there is mounting evidence (reviewed by Ross, Edmondson, & Seibert, 1986) 
that tone and intonation perception (and production) are controlled by opposite hemispheres of 
the brain. At least some forms of linguistic/paralinguistic segrega'ion may thus have a basis in 
neurophysiological compartmentalization. A general conclusion to be drawn from research on 
perceptual normalization is that the auditory parameters underlying phonetic classification are 
not absolute quantities but relationships in the spectral and/or temporal domain, computed over 
a relatively restricted temporal interval, whereas properties signalling speaker sex or identity, 
emotion, speaking rate, etc., accumulate over longer stretches of speech and/or are based on 
more nearly absolute quantities. 

D. Segregation of Intertwined Linguistic Information 

The emphasis on linguistic information in the vast majority of speech perception studies 
makes it difficult to find good examples of research on perceptual segregation of linguistic and 
(rather than from) nonlinguistic information. Examples of segregation of equivalent information 
are easier to find when only linguistic information is involved. This leads me to the final topic, 
one that has been of enormous significance in speech perception research — the problem of seg- 
mentation, that is, the perceptual separation of the overlapped acoustic correlates of adjacent 
phonemic units, particularly of vowels and consonants. 

One traditional view of the listener's task has been that it is one of phoneme (or feature) 
extraction, including ^compensation" for contextual influences on a segment's acoustic correlates 
(see the critique by Fowler, 1986). Numerous studies have shown that listeners perceive segments 
as if they knew all the contextual modifications their acoustic representations undergo (see Repp, 
1982; Repp & Liberman, 1587). Thus, for example, a fricative noise ambiguous between /s/ 
and ///in isolation is perceived as /s/ when followed by /u/ but as /J/ when followed by 
/a/ (Mann & Repp, 1980). One way of describing this finding is that listeners "know" that 
anticipatory liprounding for /u/ may lower the spectrum of a preceding fricative noise, so they 
adopt a different criterion for the /s/-/ J / distinction in thai context. This view, which emphasizes 
the role of tacit phonetic knowledge in speech perception, has recently been elaborated by such 
authors as Flege (in press) and Repp ( 1987b K The perceptual accomplishment seems more 
integrative than segregative from that perspective. 

An alternative view, having an equally long history, has a recent proponent in Fowler (1984, 
1986; Fowler & Smith, 1986) who has likened the separation of overlapping segmental information 
to mathematical vector analysis. According to her theory, listeners literally subtract or factor 
out the influences of one segment on another, so that invariant segments are "heard.' 1 Fowler 
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conceives of phonetic segments as articulatory events, not as abstract mental categories (see 
the exchange on coarticulation between Fowler, 1980, 19&J. and Hammarberg, 1982), though 
listeners are assumed to be able to judge their "sound" (Fowler, 1984). Several experiments by 
Fowler (1981, 1984; Fowler & Smith, 1986) were intended to demonstrate this. They showed 
that subjects judge acoustically different representations of a segment to be more similar than 
acoustically identical ones if the former occur in their original contexts while the latter have been 
spliced into inappropriate contexts. However, since only the former match what listeners expect, 
to hear in a given context, these results are also compatible with an alternative account based 
on tacit knowledge of contextual effects in speech production (e.g., Repp, 1982; 1987b). That is, 
rather thar having access to the sound of segments (Fowler, 1984), listeners may have made their 
judgments on the basis of the discrepancy of the input from context-sensitive mental norms or 
prototypes. 

Other recent experiments in a similar vein have addressed the separation of nasality and 
vowel height information in nasalized vowels. Kawasaki (1986) showed that English listeners 
judge vowels in /m_m/ environment as increasingly nasal as the surrounding nasal murmurs 
are attenuated; that is, when the nasal consonants are intact, the vowel nasality is attributed 
to (coarticulation with) the nasal consonants, as it were, and is "factored out" from the vowel 
percept. Building on this result, Beddor, Krakow, and Goldstein (1986) first established that 
♦here are different category boundaries on synthesized /bed/-/baed/ and /bed/-/baed/ continua. 
English listeners apparently interpret some of the spectral consequences of nasalization as a 
change in vowel heifht. However, when an appropriate "conditioning environment" was added in 
the form of a postvocalic /n/, the category boundary on the resulting /bend/-/b«nd/ continuum 
was identical with that on the /bed/-/baed/ continuum, as if listeners attributed the vowel nasality 
to (coarticulation with) the nasal consonant and "factored it out" in Fowler's sense. The result is 
equally compatible, however, with a theory that postulates context-sensitive vowel (or syllable) 
prototypes. Indeed, it may be difficult to come up with any decisive experiments. Mentalism and 
realism may simply represent different metatheoretical perspectives. 

Current efforts at Haskins Laboratories to model articulation as a sequence of overlapping 
segmental gestures (e.g., Browman & Goldstein, 1986; Kelso et al., 1986) may ultimately provide 
ways of recovering these gestures from the acoustic signal and thus provide a machine implemen- 
tation of Fowler's vector-analytic concept. A promising mathematical technique for achieving the 
same goal, based on principal components analysis of vocal tract area function parameters, has 
been proposed by Atal (1983) and is currently being explored by Marcus (Marcus & Atal, 1986; 
Marcus & Van Lieshout, 1984). The recovery of articulatory parameters from the acoustic signal 
remains a central problem in speech research because phonemes and alphabets surely represent 
an articulatory, not an acoustic classification. However, while a solution of this problem would 
bring us a great step forward, processes of integration and segregation would still be needed to 
translate the articulatory "score" into a sequence of discrde segments. 

IV. SPEECH PERCEPTION WITHOUT INTEGRATION AND SEGREGATION? 

In the introduction, I discussed four basic assumptions: the separation of the physical and 
mental worlds, the, existence of physical units, th" existence of mental units, and the existence of 
processes relating the two kinds of units. Can a theory of speech perception do without them? 
The assumptions are not independent, of course: If the physical and mental worlds are distinct, 
they must receive different descriptions; to be easily communicable in the scientific world, these 
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descriptions must be in terms of discrete concepts or units; and this results in certain functions 
or relationships between the two descriptive domains. If the physical and mental worlds were 
isomorphic, there would be no need for a theory of perception. If one or the other description 
were without units (more likely an error of omission than a deliberate theoretical choice), then 
perception would seem either entirely integrative or entirely segregative — not an attractive state of 
affairs.. Denial of functions, however abstract, Unking the two domains would merely impoverish 
perceptual theory. Certainly we need these functions in theories of auditory processing and 
organization. As to the perception of phonetic information, however, an alternative approach has 
been proposed. 

This approach, stated most eloquently by Studdert-Kennedy (1985) and Fowler (1986), fol- 
lows the "direct-realist" perspective of ecological psychology (see, e.g., Gibson, 1979; Warren 
& Shaw, 1985). Although it affirms the existence of linguistic units as articulatory events, it 
essentially abandons the distinction between the physical and mental domains. The segmental 
structure of speech (as characterized by the linguist or phonetician) is assumed to be ever-present 
on its way from the speaker's to the listener's brain. There is assumed to be a direct isomor- 
phism between physical and mental descriptions of speech events (such as phonemes), though 
it is acknowledged that the appropriate physical and motor-dynamic descriptions have not been 
fully worked out. Thus this school of thought rejects the idea that the input is divided into parts 
that need to be integrated or segregated by the listener; rather, the input units are taken to 
be identical with the perceptual units — that is, they are already integrated or segregated with 
respect to more primitive acoustic or auditory units. The deliberate strategy of this philosophy 
ii> to eliminate classical problems in perceptual research (such as segmentation and invariance) by 
redefining and redescribing physical vents. Rather than being attributed to the perceiver's brain, 
the burdens of information integration and segregation thus fall upon the investigator trying to 
find an "integral" description of "separate" speech events. However, this effort is equivalent to 
that of finding a principled explanation of perceptual integration and segregation: If we can show 
that certain pieces of input are always integrated, we might as well call them integral and treat 
them as a single piece in our descriptio ns — if we only had names for them. Behind the rhetoric 
and the different terminologies of mertalistic and realistic approaches lies a common goal: to 
arrive at the most economic characterization of linguistic structure in all its physical incarna- 
tions. Clearly, even speech research propelled by a mentalistic philosophy (still predominant in 
the field) must strive to minimize the work attributed to a speaker-listener's mind. But will we be 
able to relieve it of its entire burden to integrate and segregate? What we take away (in theory) 
is likely to re-emerge as logical conjunctions, disjunctions, and relational terms in our physical 
characterization of speech events. As long as we scientists communicate in conventional language, 
integration and segregation at some stage in our theories will be difficult to avoid. 
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SPEECH PERCEPTION TAKES PRECEDENCE OVER NONSPEECH 
PERCEPTION* 



D. H. Whalen and Alvin M. Libermanf 

Abstract. When made mc*r intense, some components of a speech 
signal can be heard simultaneously as speech and nonspeech — a form 
of "duplex" perception— though at lowct intensities, the speech alone 
is heard. Such intensity- dependent duplexity implies the existence of 
a phonetic mode of perception that takes precedence over auditory 
modes. 

INTRODUCTION 

One theory of speech perception holds that there is a bioiogically distinct system, or "mod- 
ule," specialized for extracting phonetic elements — notably, consonants and vowels — from the 
sounds that convey them (Liberman & Mattingly, 1985). The percepts produced by this module 
are immediately phonetic in character; accordingly, they stand apart from auditory percepts thai 
are composed of such dimensions as pitch, loudness, and tirr 1 ::*. There is, then, no first-stage 
auditory percept, as most other theories of speech require (Cole & Scott, 1974; Oden & Massaro, 
1978; Stevens, 1975), hence no need for & subsequent stage in which the auditory tokens are 
matched to phonetic prototypes, and so made appropriate for further processii.<& as language 
Indeed, as the experiments reported here show, it is the phonetic module thai has priority, as 
if its processes occurred before, not after, those that yield the standard dimensions of auditory 
perception. 

Consistent with the existence of a distinct phonetic mode is the fact that a particular piece of 
sound can evoke radically different percepts, depending on whether or not it engages the phonetic 
module. Consider, for example, acoustic patterns sufficient for synthesizing on a computer the 
syllables "da" and "ga," as shown at the top of Figure 1. The three formants represent resonances 
of the vocal tract and have, at their onsets, frequency sweeps called transitions. These transitions 
last approximately 50 ms and reflect the way the resonances change as the tongue and jaw move 
from the consonant to the vowel. Normally, the perceived distinction between "da" and "ga" 
depends on many acoustic variables; as seen in the figure, however, it can be made to depend 
only on differences in the transition of the third formant. Thus, in the context of the syllable, 

* Science, 1987, 237. 169-171. 

f Also Ya!e University and University of Connecticut. 
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Figure 1. Schematic representation of the syllables. 

these transitions become crucial to the phonetic percept. But in isolation (as at bottom right 
of the figure) they are heard as the glissandi or differently pitched "chirps" that psychoacoustic 
considerations world lead one to expect. These two ways of perceiving the formant transitions — 
one phonetic, the other auditory — are strikingly different: There is no hint of chirpiness in the 
"da" or "ga," and no da-ness or ga-ness in the chirps; moreover, the transitions are discriminated 
differently depending on the mode in which they are perceived (Mattingly, Liberman, Syrdal, & 
Halwes, 1971). 

Under special circumstances, the transitions can evoke the phonetic and auditory percepts 
simultaneously. This curious effect, called "duplex perception," occurs when the third-formunt 
transition is presented by itself to one ear, while the remainder of the pattern, called the "base," 
(see the bottom left of the figure) is presented to the other. Listeners then simultaneously hear 
a chirp (in the car to which the transition is presented) and (in the other ear) the syllable "da" 
or "ga," as determined by the transition. These simultaneous percepts, and the very different 
discrimination functions they yield, are very nearly the same as those produced, separately, by 
the isolated transitions and the whole syllable (Mann & Liberman, 1983). 

Since duplex perception occurs in response to a fixed acoustic pattern and results in two 
simultaneous percepts, it can hardly be attributed to auditory interactions arising from changes 
in acoustic context or to a shifting of attention between two forms of an ambiguous stimulus. And 
the fact that the "da" or "ga" is perceived to be entirely in one ear, though the critical transition 
had been presented only to the other, argues that the incorporation of the transition into the 
base is an integration at the perceptual level, not a "cognitive" afterthought, that deliberately 
combines what had initially been perceived as separate. 

Thus, duplex perception provides support for the view that there are distinct phonetic and 
auditory ways of perceiving the same (speech) signal, but in so doing, i poses a question that 
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might otherwise have gone unasked: Why, in the normal case, are the components of speech not 
perceived duplexly— that is, why is the "da" or "ga" not normally accompanied by the chirp? 

Relying on considerations of plausibility and parsimony, Mattingly and Liberman (in press) 
proposed that the phonetic module "preempts" the phonetically relevant parts of the signal before 
making the remainder available to auditory processing. This proposal seemed plausible, because, 
i contrast to the indefinitely large set of acoustic events that occur, phonetic events form a nat- 
ural class that is defined by its correspondence to the acoustic results of specialized movements of 
the articulatory organs. The proposal was parsimonious because the very processes of phonetic 
perception remove from the signal all evidence of those phonetic events, and thus preclude such 
(parallel) processing as would cause them to be perceived yet again as chirps. This "reemptive- 
ness" is similar to the precedence we have spoken of, and that we mean to demonstrate directly 
with a new and somewhat simpler version of duplex perception. (See Darwin & Sutherland, 1984, 
p. 206, for a related observation,) 

The new procedure differs from the old in that the two parts of the signal are not divided 
between the ears, but are, rather, presented equally to both. Now duplexity is produced (in 
both ears at once) by changing the intensity of the transition relative to the base. At relatively 
low intensities, the transitions serve only their expected phonetic function. At higher intensities, 
however, the transitions continue to make their phonetic contribution but simultaneously evoke 
nonspeech "chirps." These observations, which we made initially in pilot experiments, suggested 
that we test the following generalizations: 

1) In isolation, neither transition sounds like "da" or "ga." 

2) In syllabic context, the transitions will, at some intensity, evoke nonspeech chirps, es- 
tablishing a "duplexity threshold." 

3) Above the duplexity threshold, t\e chirps can be matched to those evoked by the tran- 
sitions in isolation. 

4) Both below the duplexity threshold and above it, the transitions appropriately determine 
whether the syllable is heard as "da" or "ga." 

The stimuli were the same as those represented in the figure, except that the third-formant 
transitions were not frequency bands excited by a fundamental (as were the formants of the 
base), but, rather, time-varying sinusoids that follow the center frequencies. We had found that 
such sinusoidal transitions combine with the formant-synthesized base to make coherent phonetic 
percepts, in this case "da" and "ga." hut the sinusoids have the advantage, for our purposes, 
that in isolation they produce "whistles," which we found to be more easily discriminated than 
the chirps, and even less speech-like. 

The base syllable was created with a software formant synthesizer; the sinusoids were cre- 
ated with another software synthesizer designed for pure-tone generation. From a set of input 
parameter values representing frequencies and amplitudes, each synthesizer calculated a digital 
waveform that was then turned into sound via a digital-to-analog converter. 

The base was synthesized in one computer file and the two sinusoidal transitions (one modeled 
after "d" and one after "g") in two other files. The base and one transition could then be output 
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through synchronized D-to-A channels, separately attenuated, and electronically combined for 
presentation over headphones as a single sound to subjects. The base was presented at a fixed 
intensity of 72 dB SPL. 

Eleven young adult speakers of English (six female and five male) with no reported hearing 
problems were run in separate sessions. None knew anything about the composition of the stimuli 
or the purpose of the experiment. They were paid for their participation. One failed to perceive 
in a duplex fashion at the intensity levels available, and so was excluded from all analyses. 

Initially, subjects were asked to identify the sinusoidal transitions as "da" or "ga." Twenty 
repetitions of each were presented in random order. The subjects implied that they considered 
the request absurd, since, as they insisted, the whistles did not sound at all like speech. They 
nevertheWs complied, with results that are shown in the first column of Table 1. (For all tests, 
there was no significant difference between the responses to the "d" and "g" stimuli, so only the 
combined percentages are reported.) Most subjects picked one whistle or the other as "da" and 
held to that consistently. Some happened to pick the correct one; others were just as consistently 
wrong. One (S9) simply called all the whistles "da." Overall, identification accuracy did not 
differ significantly from chance, t(9) — 1.22, n.s. 



Table 1 

Percent correct performance on the four main 
tasks (results from 40 trials per subject). 



Subject 


Identification 


Match of 


Identification of syllables 




of isolated 


"duplex" to 


as "da" 


or "ga" 




sinusoids 


isolated 


below duplexity 


above duplexity 




as "d" or "g" 


sinusoids 


threshold 


threshold 


1 


72.5 


92.5 


100.0 


100.0 


2 


100.0 


65.0 


100.0 


97.5 


3 


15.0 


97.5 


100.0 


100.0 


4 


95.0 


97.5 


100.0 


100.0 


5 


30.0 


85.0 


97.5 


100.0 


6 


95.0 


72.5 


92.5 


85.0 


7 


100.0 


87.5 


82.5 


97.5 


8 


0.0 


95.0 


52.5 


100.0 


9 


50.0 


47.5 


100.0 


97.5 


10 


90.0 


65.0 


100.0 


IOU.0 


Mean 


64.8 


80.5 


92.5 


97.8 


S.E.M. 


±12.1 


±5.4 


±4.8 


±1.5 



To find the intensity at which the sinusoids in syllabic context evoked nonspeech whistles in 
addition to "da" or "ga" (the "duplexity threshold"), we had the subjects adjust the attenuator 
that controlled the intensity of the sinusoid until the whistle was just audible. This was done three 
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times for each sinusoid. The mean duplexity thresholds for all subjects, expressed in relation to 
the steady-state of the third formant, were -6.4 db (s.d. 5.0 db) for the "da" sinusoid and 0.0 db 
(s.d. 4.9 db) for the "ga" sinusoid. This difference in duplexity thresholds, which was found for 
all ten subjects, is consistent with the fact that, in isolation, the "da" sinusoid — the one with the 
lower duplexity threshold — was louder. 

To make sure that the whistle component of the duplex percept was comparable to the 
whistle of the sinusoid in isolation, we carried out a matching test. On each trial, three stimuli 
were presented: first, one sinusoid in isolation, then either of the two sinusoids in syllabic context, 
and finally the other sinusoid in isolation. Each sinusoid occurred with the syllable twenty times, 
matching the first sinusoid or the last an equal number of times. The sinusoid in the syllable was 
presented at 6 db above the duplexity threshold for "ga." Subjects judged whether the duplexly 
perceived whistle was more like the isolated whistle that preceded or followed. As the second 
column of Table 1 makes clear, subjects were able to do this rather demanding task well above 
chance, *(9) = 5.50, p < .001. 1 

To test whether the sinusoids reliably determined how the syllable was perceived below the 
duplexity threshold, we set them 4 db below the "da" duplexity threshold and presented twenty 
repetitions of each in random order. Subjects were to identify the consonant as tt d" or "g." Again, 
they performed well above chance, t(9) = 8.88, p < .001, as seen in Table 1, column 3. 

It remained, then, to determine that the sinusoids continue to provide phonetic information 
even when they also evoke whistles. For that purpose, we set the sinusoids at 6 db above the 
higher ("ga") duplexity threshold and carried out an identification test like the one just described. 
Comparing the rightmost columns of the table, we see that subjects were no less accurate above 
the duplexity threshold than below it, t(9) = 32.60,p < .001 for Column 4. 

Thus, at lower levels of intensity, the sinusoids provide the basis for the perceived distinction 
between "da" and "ga"; at higher levels, they serve this same phonetic purpose, but also evoke 
nonspeech whistles. As we found from our own listening, the phonetic information is provided 
over a range of approximately 20 db below the duplexity threshold; 2 the whistles, which are, of 
course, barely audible at the duplexity threshold, become louder as the intensity of the sinusoid 
is further increased. These results show f hat processing of the sinusoid as speech has priority, 
thereby defining what we mean by precedence of the phonetic module. 

Unlike the earlier form of duplex perception, which required that the transitions and the 
remainder of the pattern be presented to different ears, the one reported here puts all parts of 
the pattern equally into both. It thereby avoids such complications of interpretation as may arise 
with dichotic stimulation, and so makes more straightforward the inference we would draw: that 
duplex perception reflects distinct auditory and phonetic ways of perceiving the same stimulus, 

1 Below the duplexity threshold, such matching would presumably be at chance. Still, it is pos- 
sible that forced matching is a more sensitive measure than the one we used to obtain the threshold 
itself. So we applied the matching procedure at 4 db below the lower ("d") threshold, using eight 
highly practiced subjects. As expected, the responses (45.3% correct, f(7) = -1.28,p > 0.2) were 
at chance. 

2 Bentin & Mann (1983) found a similar range in a dichotic task- though they interpreted it 
as a difference in sensitivity, not as preemption. 
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Beyond that, the results obtained with the new form of the duplex phenomenon support the 
hypothesis that the phonetic mode has prior claim on the transitions, usiii£ them for its special 
linguistic purposes until, having appropriated its share, it passes on the remainder to be perceived 
by the nonspeech system as "auditory" whistles. Such precedence reflects the profound biological 
significance of speech. 

References 

Bentin, S., & Mann, V. A. (1983). Selective effects of masking on speech and nonspeech in the 
duplex perception paradigm. Haskins Laboratories Status Report on Speech Research, 
SR-76, 65-85. 

Cole, R. A., & Scott, B. (1974). Toward a theory of speech perception. Psychological Review, 81, 
348-374. 

Darwin, C. S., & Sutherland, N. S. (1984). Grouping frequency components of vowels: When is a 
harmonic not a harmonic? Quarterly Journal of Experimental Psychology, 36 A, 193-208. 

Liberman, A. M., Mattingly, I. G. (1985). The motor theory of speech perception revised. 
Cognition, 21 , 1-36. 

Mann, V. A., & Liberman, A. M. (1983). Some differences between phonetic and auditory modes 

of perception. Cognition, 14, 211-235. 
Mattingly, I. G., & Liberman, A. M. (in press). Specialized perceiving systems for speech and 

other biologically significant sounds. In G. .M. Edelman, W. E. Gall, & W. E. Cowan, 

(Eds.), Functions of the auditory system. New York: Wiley. 
Mattingly, I. G., Liberman, A. M., Syrdal, A. K., & Halwes, T. (1971). Discrimination in F^eech 

and nonspeech modes. Cognitive Psychology, 2, 131-157. 
Oden, G. C, & Massaro, D. W. (1978)., Integration of featural information in speech perception 

Psychological Review, 85, 172-191. 
Stevens, K. N. (1975). The potential role of property detectors in the perception of consonants. In 

G. Fant & M. Tatham (Eds.), Auditory analysis and perception of speech (pp. 303-330). 

New York: Academic Press. 



ERLC 



45 



EVIDENCE OF TALKER-INDEPENDENT INFORMATION FOR 
VOWELS* 



Robert R. Verbruggef and Brad Rakerd} 



Abstract. The vowel information present in initial and final regions 
of /b/-vowel-/b/ syllables was examined in thi„ study. Vowels were 
identified for unedited syllables spoken by a man and a woman, for 
the initial 20% of those syllables, for the final 20% of the syllables, 
for the initial and fir.al 20% of the syllables combined and separated 
by a 60% silent gap, and for the initial and final 20% of the syllables 
interchanged across talkers and separated by a 60% sihnt gap. Results 
indicate: (1) that there is considerable vowel information present in 
the dynamic regions at the beginnings and endings of syllables; (2) that 
the information is, to a large extent, carried relationally by those re- 
gions; (3) that the information is talker-independent inform; and (4) 
that the information is complementary to, and distinct from, formant 
frequency information present in a syllable's center. An experiment 
assessing the perceived source(s) of these stimuli suggests that source 
perception is influenced by as yet unspecified acoustic modulations de- 
fined at the syllable level. 

INTRODUCTION 

When a vowel is coarticulated with preceding and following consonants to form a syllable, 
the resulting acoustic pattern usually includes periods of rapid spectral change at its beginning 
and end, and a period of relative spectral constancy at its center. It is well established that the 
configuration of formant frequency values present, or best approximated, at the syllable center 
provides information about the identity of the vowel (e.g., Joos, 1948; Ladefoged, 1975; Peterson 
& Barney, 1952). After Strange, Jenkins, and Johnson (1983), we will refer to the ideal form of 
this configuration as an acoustic target.. 

There have been recurring indications that vowel information is also provided by the more 
dynamic regions of the syllable (Lehiste & Meltzer, 1973; Lindblom & Studdert- Kennedy, 1967; 



* Language and Speech, 1986, 29, 39-57. 
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Shankweiler, Verbrugge, & Studdert-Kennedy, 1978; Strange, Verbrugge, Shankweiler, & Edn an, 
1976). Perhaps the most compelling evidence of this comes from the experiments of Strange et 
al. (1983; also Jenkins, Strange, & Edman, 1983). Those investigators assessed the perception of 
stimuli that preserved only the dynamic beginnings and endings of /b/-vowel-/b/ syllables, the 
syllable centers having been deleted and replaced with silence. Listeners spontaneously integrated 
the initial and final portions of these "silent-center" syllables, typically hearing a single utterance 
with an interruption in the middle (somewhat like a glottal stop). More importantly, vowel 
identification for these syllables was remarkably accurate, not differing significantly from the 
accuracy of identification for unedited syllable3. 

Two competing explanations for this silent-center finding provide the motivation for the 
present study. First, it is conceivable that listeners used the dynamic regions of those syllables 
to extrapolate to the formant-frequency targets that had been excised from the syllable centers. 
Lindblom (1963; also Lindblom & Studdert-Kennedy, 1967) has suggested that listeners make 
such extrapolations as a matter of course when processing natural speech. Whenever a talker 
speaks rapidly or destresses the production of a syllable, formant frequencies are "reduced," i.e., 
they fail to reach target values at the syllable center (Joos, 1948; Lindblom, 1963). Lindblom's 
(1963) proposal is that in these situations listeners draw on information in the dynamic regions 
to compute the missing targets. Specifically, they are said to draw on the fact that the initial 
and final formant trajectories form exponential functions that decelerate toward, or accelerate 
from, asymptotic target frequencies. To summarize, on this view the dynamic regions of a sylla- 
ble contribute to vowel perception by subserving the more accurate estimation of target values 
approximated at the syllabi? center. 

An alternative view of the silent-center result is that the dynamic regions convey vowel in- 
formation that is complementary to, and distinct from, target information. One way to motivate 
this alternative is to think of vowels as articulatory events, that is, as gestures that manifest 
a characteristic organization of forces over the articulators (Fowler, 1977, 1980; Fowler, Rubin, 
Remez, & Turvey, 1980). From this perspective, the vowels of a dialect are distinguished by dif- 
ferent "styles" of articulatory movement. The resulting acoustic modulations provide substantial 
information about vowel identity, information that differs in kind from the target information 
present at a syllable's center. 

To test the competing claims of the target-extraction and event-perception hypotheses, we 
constructed hybrid silent-center syllables, pairing the initial and final portions of corresponding 
syllables spoken by a man and a woman. According to the target hypothesis, a hybrid syllable 
should be very disruptive perceptually. Because the man and woman have different vocal tract 
sizes and shapes, their corresponding syllable portions should "point to" very different targets. 
This is illustrated in Figure 1. On the left are spectrograms of the man's and woman's productions 
of the syllable /baeb/. On the right those spectrograms have been cross-spliced to juxtapose their 
centers. It is clear that the center formant frequencies are quite discrepant, making it highly 
unlikely that any extrapolated target values could coincide across talkers. 

According to the event hypothesis, a discrepancy in syllable centers is not necessarily disturb- 
ing. Talkers who speak a common dialect would be expected to produce a vowel with a common 
style of articulatory and acoustic change that is independent of idiosyncratic differences in vocal 
tract size. Therefore, the event hypothesis, in its strongest form, predicts that the woman's and 
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Figure 1. Spectrograms of the man's and woman's productions of /baeb/ a»-e presented cn the left of tl'e figure. 
To create the patterns on the right, those spectrograms were cut at the center of their voiced regions, and the 
initial and final halves were interchanged. 

man's syllable portions should be integrated perceptually, and that accuracy of vowel identifica- 
tion should be high, perhaps as high as for single-talker silent-center syllables. 

EXPERIMENT 1: VOWEL PERCEPTION 

In this experiment we assessed the accuracy of vowel identification foi hybrid-silent-center 
syllables and for a number of comparison syllables. 

Method 

Stimuli 

The stimuli for all experimental conditions weie derived from natural speech tokens of /b/- 
vowel-/b/ syllables. Syllable voweic were the American English vowels /i, i, e, £, a?, «,a,o, o, u, 
\ man and a voman each produced three tokens of each syllable. The syllables were produced in 
citation form and were paced to match the beat of a metronome. Productions were recorded on 
audio tape and then digitized for editing (sampling rate =. 20 kHz). For each of the eleven vowels, 
we selected the pair of syllables, one from each talker, that were most closely matched in duration. 
In general it proved possible to find a very close match. The largest durational disparity was 20 
ms and the average disparity was 4.5 ms (?% of the duration of the 1 ^rage voiced region, whicl 
was the same for both talkers). 
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Table 1 



The Woman's (W) and Man's (M) Formar 1 Frequency Value in Hz, 
and Their Absolute Differences Expressed as a Ratio of the Man's Values. 







First Formant 




Second Formant 


rr 
J 


I;ird Formant 


Vowel 


W 


M 


(W-M)/M 


W 


M 


(W-M)/M 


w 


M 


(W-M)/M 


i 


320 


320 


0.00 


2480 


2080 


0.19 


3240 


2840 


0.14 


i 


400 


480 


0.17 


2080 


1760 


0.18 


2840 


2480 


0.15 


e 


320 


400 


0.20 


2240 


1920 


0.17 


3000 


2560 


0.17 


e 


560 


480 


0.17 


18-13 


1520 


0.21 


2560 


2480 


0.03 


a? 


640 


560 


O.l^f 


2080 


1480 


0.41 


2920 


248^ 


0.18 


a 


720 


560 


0.29 


1320 


1160 


0.14 


2920 


2480 


0.18 


A 


640 


18U 


0.33 


1240 


1080 


0.15 


3000 


2480 


0.21 


0 


640 


480 


0.33 


1240 


1.000 


0 24 


2920 


2480 


0.18 


o 


480 


400 


0.20 


1000 


920 


0.09 


2760 


2320 


0.19 


u 


480 


400 


0.20 


1160 


1000 


0.16 


2760 


2400 


0.15 


u 


320 


320 


0.00 


1160 


840 


0.38 


2760 


2160 


0.28 


MEAN 






0.18 






0.21 






0.17 


/e, 0/ excluded 


0.18 






0.23 






0.i7 



Tabk 2 

Average Women's (W) and Men's (M) Formant Frequency Values in Hz, 
and Their Absolute Differences Expressed as a Ratio of the Men's Values. 
These Data Are ; rom Peterson and Barney (1952). 







First Formant 




Second Formant 


Third Formant 


Vowel 


W 


M 


(W-M)/M 


W 


M 


(W-M)/M 


W 


M 


(W-M)/M 


i 


310 


270 


0.15 


2790 


2290 


0.22 


3310 


3010 


C 10 


i 


430 


390 


0.10 


2480 


1990 


0.25 


3070 


2550 


0.20 


e 


610 


530 


0.15 


2330 


1840 


0.27 


2990 


2480 


0.21 


ae 


860 


660 


0.30 


2050 


1720 


0.19 


2850 


2410 


0.18 


a 


85f 


730 


0.16 


1220 


1090 


0.12 


2810 


2440 


0.15 


A 


760 


640 


0.17 


1400 


1190 


0.18 


2780 


2390 


0.16 


0 


590 


570 


0.04 


920 


840 


0.10 


2710 


2410 


0.12 


U 


470 


440 


0.07 


1160 


1020 


0.14 


2680 


2240 


0.20 


u 


370 


300 


0.23 


950 


870 


0.09 


2670 


2240 


0.19 


MEAN 






0.15 






0.17 






0.17 



* 

Spectral comparison 1: Between talkers. The formaats of the woman's vowels ( W 
vowels) were typically higher in frequency than the formants of the corresponding vowels spoken 
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by *he man (AT vowels). Table 1 reports their formant frequency values and shows that, on 
average, those values differed by x 8%, 23%, and 17% for the first (Fi ), second (F 2 ), and third (F 3 ) 
formants respectively. 1 For comparison, we determined the average formant frequency differences 
between men and women based on Peterson and Barney's (1952) normative vowel data. That 
analysis is summarized in Table 2. Peterson and Barney found that formant values of an average 
adult female talker (n = 28) differed from those of an average male talker (n = 33) by 15% for 
Fi, 17% for F2, and 17% for F 3 . The formant frequency differences between the two talkers of 
the present study were very close to these norms. 

Spectral comparison 2: Within talkers. In absolute terms, the average formant fre- 
quency differences between our W and M vowels were 80 Hz for Fj, 282 Hz for F 2 , and 420 Hz 
for F5. We wondered how these values compared with witlr.n-talker differences for the produc- 
tion of different vowels. Table 3 shows an analysis in which each talker's formant frequencies 
were rank-ordered and the differences between neighboring frequencies computed. The average 
differences were 24, 124, and 68 Hz respectively for Fi, F 2 , and F 3 of M vowels, and 40, 148, 
and 68 Hz for Fi, F 2 , and F 3 of W vowels. All of these values were less than half the size of 
between-talker production differences. We expect, therefore, that if a listener extrapolated to 
target values from the beginnings and endings of hybrid syllables, those targets would often be 
associated with different vowels. 

The same expectation is supported by an analysis of the distribution of the two talkers' vowel 
tokens in Fi-Fn space. Figure 2 shows that distribution, for a space in which the axes have been 
scaled to agree with those chosen by Peterson and Barney (1952). Note that for eight of the 
eleven vowel categories the man's token is closest to a t^ken of a different vowel in the woman's 
space. In he '*se the mismatch is even more extreme; 10 of her 11 tokens lie nearest to a token 
of a different v&tegory in his space. Th ; s clearly indicates that the initial and final portions of a 
hybrid syllable would generally "point to" different target vowels when referred against a single 
talker's Fj-F 2 space. 

Experimental Conditions 

The W and M syllables were edited for presentation in our experimental conditions according 
to the general procedures outlined by Strange et a'. (1983). Each syllable was divided into 
three portions. (Ij The initial portion of a syllable included the release burst of its initial /b/ 
plus 20% of the voiced region. (2) The central portion included the middle 60% of the voiced 
region. (3) The final portion included the final 20% of voicing plus the closure and release of 
the syllable-f.nal /b/. 2 All measurements were made to the nearest zero-crossing of the speech 

1 These figures are based on measurements of the nine vowels foi which Peterson and Barney 
(1952) provide a comparison (/i, 1, <r, ae, a, A,o,r, u /). When we include in our analysis tlu 
vowels /e, o/, the woman's vowel formant frequencies differ from the man's hv an average of 18%, 
21%, and 17% for F., Fo, and F3, respectively. 

2 Our editing procedures differed from those of Stiangc et al. (l c ),x:$) and Jenkins et al. ( in 
terms of the percentage of the voiced region assigned to initial, venter, and final syllable portions. 
Our choice of 60% as the center proportion is laiger, on aveiage, than their value, which varied 
from 50-60% depending on vowel category. As a lesulUour silent-center and hybrid-silent-< enter 
conditions involve a more severe deletion of signal. 
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Table 3 



The Woman's and Man's Formant Frequencies (F) in Hz 
Rank Ordered and Differenced (F-Fprev). 

First Formant Second Formant Third Formant 



Talker 



worn &ii 



MEAN 



man 



MEAN 



F 


F-Fprev 


F 


F-Fprev 


F 


F-Fnr\ 

J, 1 ft M 


320 




1000 




2560 




320 


0 


1160 


160 


2760 


200 


320 


0 


1160 


0 


2760 


0 


400 


80 


1240 


80 


2760 


0 


480 


80 


1240 


0 


2840 


80 


480 


0 


1240 


0 


2920 


80 


560 


80 


1840 


600 


2920 


0 


640 


80 


2080 


240 


2920 


0 


640 


0 


2080 


0 


2920 


0 


640 


0 


2240 


160 


3000 


80 


720 


80 


2480 


240 


3240 


240 




40 




148 




68 


320 




840 




2160 




320 


0 


920 


80 


2320 


160 


400 


80 


1000 


80 


2400 


80 


400 


0 


1000 


0 


2480 


80 


400 


0 


1080 


80 


2480 


0 


480 


80 


1160 


80 


2480 


0 


480 


0 


1480 


320 


2480 


0 


480 


0 


1520 


40 


2480 


0 


480 


0 


1760 


240 


2480 


0 


560 


80 


1920 


160 


2560 


80 


560 


0 


2080 


160 


2840 


280 




24 




124 




68 



waveform. Various combinations of the syllable portions were used to prepare the stimuli for five 
experimental conditions, as illustrated in Figure 3. 

Whole syllables. For the whole-syllable condition, all three syllable portions were presented 
in their original temporal relation (i.e., the syllables were unedited). An example of a whole 
syllable, the woman's /baeb/, is shown at the top of Figure 3. There were 22 whole-syllable 
stimuli, 11 different syllables produced by each of the two talkers. These syllables are comparable 
to the "Control" syllables of Strange et al. (1983) and Jenkins et al. (1983). 
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0 200 400 «00 800 

FREQUENCY OF FIRST FORMANT (HZ) 



Figure 2. Distribution of the man's and woman's vowels in an F\ / F2 space. 

Silent centers. Second from the top is an example of a silent-center syllable. The central 
portion of the woman's /baeb/ has been excised and replaced by silence in this instance. We 
created one silent-center version of each of the 22 syllables. 

Hybrid silent centers. Third from the top of the figure is an example of a hybrid-silent- 
center syllable combining the initial portion of the woman's (W) /baeb/ with the final portion 
of the man's (M) /oeeb/. The silent interval separating these portions was the same as for the 
woman's silent-center /baeb/. Eleven W/M and 11 M/W hybrids comprised the stimuli of this 
condition. 

Initial portions. The 22 initial syllable portions provided the materials for this condition. 
Final portions. The 22 final syllable portions provided the materials for this condition. 
Subjects 

The subjects oi this study were undergraduates enrolled in an introductory psychology course. 
Their participation partially fulfilled a course requirement. All of the subjects were native speakers 
of English. They had no known hearing difficulties and they had no knowledge of the hypotheses 
under test. The subjects were randomly assigned to one of the five experimental conditions, 
distributed as follows: whole-syllable condition (n ~ 10), silent centers (r? - 15), hybrid silent 
centers (n = 12), initial portions (n = 11), final portions (n = 11). 
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WHOLE 
SYLLABLE 




SILENT CENTER 



HYBRID 



INITIAL 
PORTION 



SILENT 
CENTER 




Jl|u 



FINAL 
PORTION 




Figure 3. Sample tokens of stimuli from the five experimental conditions as indicated. Al! stimuli derive from 
the woman's and man's productions of /baeb/ 



Stimuli wei? presented through headphones at a comfortable listening level. A separate 
group of listeners judged the stimuli of each condition. The subjects were told that they would 
be hearing edited versions of natural speech, and that they were to decide which of 11 alternative 
vowels best matched the vowel that they heard on each trial. Their decisions were reported by 
circling one of 11 /b/-vowel-/b/ words written in English orthography. 

Prior to testing, the subjects listened to a demonstration sequence and then completed a block 
of practice trials. The demonstration sequence consisted of two randomized presentations of the 
22 whole-syllable stimuli with two-second pauses between them. The practice block consisted of 
two randomized presentations of the 22 stimuli of the condition to be tested, with four-second 
pauses between them. The subjects were required to make i espouses to the pi act ice stimuli so 
that they would become familiar with the answei sheet. thc\ were given no feedback as to the 
accuracy of those responses. 

After the practice block, the subjects were allowed to ask questions of clarification about 
the testing procedure. The testing session commenced iminrdiateh after these questions. Theie 
were a total of 220 test trials, 10 randomized piesentat jus of the 22 stimuli for a condition. A 
four-second pause separated succeeding stMiiuli. Subjects weie given a five-in,,*ute break halfwa) 
through the test. 



Procedure 
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VHOtl SIUWT *TM!» INITIAL Ft MM 

tniAtltt CtMTlM tlLUTT POUT KM* tOftTltttS 

cntni 



Figure 4. Vowel identification error r»te* for the five experimental conditions. Errors are pooled over 11 vowels 
and over two talkers. 

Results and Discussion 

The overall results for the five listening conditions are displayed in Figure 4. Each bar denotes 
the mean percentage of errors in vowel identification for the indicated condition, where an error 
was defined as a failure to categorize a vowel in the sam vay that the talker intended. Mean 
percentage errors by condition were as follows: whole syllables (8,8%), silent centers (23.1%), hy- 
brid silent centers (27.4%), initial portions (56.4%), final portions (73.8%). Analysis o' variance 
showed the differences in error rates across conditions to be highly significant: F(4,54)= 144.6; 
p <0.001. Post hoc tests (Newman-Keuls) revealed that all pairwise differences among the con- 
ditions were significant (p < 0.01) with one exception: There was no statistically significant 
difference between the silent-center and hybrid-silent-center conditions (p > 0.05). 

Comparison with Previous Silent-center Studies 

Our results replicate and extend the central finding of previous studies examining silent-center 
stimuli (Jenkins et al., 1983; Strange et al., 1983)— namely, that subjects can identify vowels with 
good accuracy when syllable centers are silenced. Our results also replicate the previous finding 
that vowel perception is poor when either initial syllabi* portions or final portions are prebv.:ted 
alone. These results imply, on the one hand, that the dynamic beginnings and endings of syllables 
are a rich source of information about the syllable vowel and, on the other that the information 
is someho- conveyed relationally by those beginnings and endings. 

One contrast with past studies is our observation of a significant difference between the 
silent-center and whole-syllable conditions. Previous investigators found no differences between 
these two conditions (Jenkins et a!., 1983; Strange et al., 1983). We may have found a difference 
in this study because, on average, we deleted a somewhat greater portion of the signal in our 
silent-center condition than was deleted by othei* (see footnote 2). Other possible explanations 
are that there were bet wee ,i-study differences \:\ familiaiization with the materials, or in othei 
aspects of the training, or in the subject populations themselves. The overall error ra'es for both 
our whole-syllable and silent-center conditions were higher than those seen in previous studies, 
indicating thii others were operating much nearer to the error "floor." 
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SILENT-CENTERS % ERROR 

Figure 6. Scatter plot of errors for 11 vowels presented in silent-center (absoissa) and hybrid-silent-center (or- 
dinate) conditions. Silent-center errors are collapsed across two talkers, hybrid-silent-center errors are collapsed 
across man- woman and woman-man hybrids. The coefficient of correlation (r) is *Jso provided. 

Silent Centers vs. Hybrid Silent Centers 

Of greatest interest to us was the finding that the hybrid and silent-center conditions did 
not differ significantly. This strongly suggests that the vowel information preserved in silent 
center syllables is also preserved in hybrids, despite their change of source. That suggestion 
is strengthened further by a vowel-by-vowel comparison of errors made in the silent-centei and 
hybrid conditions. The comparison is illustrated in Figure 5. Plotted on the abscissa are errors 
for the silent-center syllables (collapsed over talkers) and on the ordinate are errors for the hybrid 
syllables (collapsed M/W and W/M versions). The two data sets are highly correlated (r =0.80; 
p <0.0l), and the clustering of points about the diagonal of the figure demonstrates how similar 
the errors are in absolute terms. The implication of all of these results is that the vowel information 
in dynamic regions of a syllable is largely invariant across talkers. It is highly unlikely that this 
dynamic information subserves the perceptual extraction of any sort of acoustic target, since 
targets are highly variant across talkers. It is much more likely that the information is indicative 
of a characteristic articulatory style that is common to productions of the same vowel by talkers 
of the same dialect. 

The Role of Syllable Duration 

Following others (Jenkins et al., 1983; Strange et al., 1983), we have proposed that the 
dynamic information for vowels is carried relationally by the initial and final syllable portions. 
Perhaps the simplest relation that might d*rry it is a durational one. One could imagine that 
information about the duration of the syllable as a whole could help a listener to distinguish 
between spectrally-similar, durationally-different vowels in the syllable nucleus. Two lines of 
evidence speak against this hypothesis. The first comes from a previous study (Strange et al., 
1983) that included conditions in which durational differences among silent-center syllables were 
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neutralized. In one condition all of the silent intervals were set equal to the shortest silent duration 
in the test set and in another they were set equal to the longest. Neither manipulation significantly 
affected the outcome when all stimuli were produced by a single talker. The "lengthening" 
manipulation did produce a small but significant increase in errors when different talkers' syllables 
were interspersed; however, this increase was manifest for vowels of all categories, not just for 
the short vowels, suggesting that factors other than vowel duration were affected. Overall, there 
was very little evidence that durational differences among silent-center syllables are an important 
source of vowel information. 

Very little evidence of this can be found in our own results as well. If duration were a primary 
factor, one would expect the lower error rates for silent centers and hybrids, relative to the initial 
and final syllable portions, to be due primarily to a reduction in short-long vowel confusions. 
Short-long vowel errors would be high for the isolated portions (where only spectral information 
is available), and low for the silent centers, because these svllables presumably supply the duration 
information needed to distinguish between spectrally-similar short and long vowels. 

The first row of Table 4 provides a summary of errors for four spectrally-similar, durationally- 
different pairs of monophthongs, for each condition of Experiment 1. The second row of the 
table presents overall errors for the eight vowels after short-long confusions have been removed. 
The third row summarizes the errors specifically due to short-long confusions. With respect to 
the duration hypothesis, two observations seem important. First, by the strong form of this 
hypothesis, errors on isolated portions are due primarily to short-long ("duration") confusions, 
and overall errors should therefore be roughly equal for silent centers and for the isolated portions 
after duration errors have been removed. The data in Table 4 (second row) do not support this 
prediction. Second, while more duration errors are observed for isolated portions than for silent 
centers (third row), the proportion of errors attributable to short-long confusions stays relatively 
constant across these conditions (see fourth row of the table). This suggests that the silent-center 
format does not differentially reduce duration-based errors, but has a broader, and different, 
kind of impact in reducing perceptual errors. Parametric studies using a broader set of stimulus 
materials will be needed to address this question further. 

Modeling the Relationship between Initial and Final Portions 

If the initial and final syllable portions are not affording listeners a better estimate of intrinsic 
vowel duration, then how is it that perception is so much better in the silent-center and hybrid- 
silent-center conditions? One might argue that it is better because in these conditions listeners 
are, in effect, given two chances to identify the vowel, one chance based on the initial portion and 
a second based on the final portion. In this section we consider this alternative. 

How might initial-portion and final-portion percepts be processed to derive a single vowel 
judgment? The simplest possibility is that those percepts are, for each vowel, perfectly inde- 
pendent and that a listener simply chooses betw°en them at random. If so, we would expect 
that errors in the silent-center and hybrid conditions should average 71% (the mean of initial- 
ancl final-portion error rates). A nonrandom selection process could, at best, produce error rates 
of 56% (taking the better of the initial- and final-portion rates for each vowel). Even the latter 
prediction is much higher than the actual error rates observed for silent centers (23%) and hybrids 
(27%). Moreover, it poorly predicts the ordering of error rates across vowels: The correlation 
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Table 4 

Mean Percentage Errors on Eight Vowels, /i, i, e, ae, a, a, u, u/, 
Including and Excluding Confusions on Adjacent Short-long Vowels 

Hybrid 

Whole Silent Silent Initial Final 
Syllable Center Center Portion Portion 



Overall errors 


8.3 


20.1 


26.3 


47.9 


65.9 


Overall errors, exclu- 
ding short-long errors 0 


3.5 


11.8 


17.8 


26.3 


39.7 


Short-long errors 


4.8 


8.3 


8.5 


21.6 


26.2 


Proportion* 


0.58 


0.41 


0.32 


0.45 


0.40 



l Short-long errors are confusions within any one of the following four vowel pairs: 
/i-i/, /£>«/> /A-a/, /u-u/. 



*Short-long vowel errors as a proportion of overall errors. 



between the nonrandom guessing prediction and the observed errors for silent centers was 0.41, 
and for hybrids it was 0.42. 

One might propose a more sophisticated decision model in which the initial- and final-portion 
percepts are processed in contingent fashion to arrive at a vowel response. For example, the 
initial portion could be used to narrow down the set of alternatives and the final portion to make 
a selection from among this reduced set. A good candidate for the initial classification is the 
intersection of two major phonetic dimensions: high-vs.-low and front-vs.-back. With respect 
to this four- way classification, listeners made a average of 26% errors when categorizing vowels 
in the initial-portion condition (excluding the diphthongs /e, o/). Estimates of the probability 
for error when making the final selection within these categories can be derived from our data 
on the final portions. When the probabilities for error in the two stages are combined, one 
obtains a predicted error rate for judgments on the silent-center and hybrid syllables as a whole. 3 
Figure 6 shows the comparison between predicted and observed errors for the hybrid condition 
(the silent-center comparison looks similar). Like the previous models, this contingent model 
generally overpredicts the absolute level of errors and poorly predicts the patterning of errors 

3 For example, in the final-portion condition the high-back vowels /u/ and /u/ were confused 
with one another on 8% (/u-u/) and 34% (/u-u/) of trials. These percentages, in combination 
with the probability of making an error when categorizing a high-back vowel as high-back in the 
initial-portion condition (20%), provided our contingent estimates for /u/ and /u/. 
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PREDICTED % ERROR 

Figure 6. Scatter-plot (and correlation) of errors on hybrid silent-center syllables, as predicted by a contingent- 
judgment model (abscissa) and as observed in the identification test (ordinate). 

among the vowels. The correlations between predicted and observed errors were 0.27 for hybrid 
vowels and 0.50 for silent-center vowels. 

The models we tested all assumed that the syllable portions were analyzed separately, and 
were only related at a late stage in a decision process. This type of perceptual analysis would 
seem to be demanded by the target-extraction view, which proposes that on-glide and off-glide 
functions separately specify a target. In particular, separate perceptual analyses would seem to be 
the only way the target view could approach the perception of hybrid syllables, since the syllable 
portions specify very different asymptotic targets in this case. However, all of the "separate 
analysis" models underpredict listeners' accuracy on the hybrids by a wide margin. This strongly 
suggests that a listener does not process the syllable portions separately but, instead, derives 
vowel information from a "superadditive" relation between them. In other words, it suggests that 
some singular function over the two portions of a hybrid is detected by the perceiver as the basis 
for a vowel judgment. 

This account cf the hybrid-syllable results is compatible with the event hypothesis, which 
holds that the early and late stages of an event should bear a principled relation to one another. 
Defining such relations in acoustic terms is a major challenge for future research. The simplest 
possibility is to define a duration measure over the hybrid syllable as a whole. However, neither 
our results nor those of Strange et al. (1983) provide much support for syllable duration as the 
critical "superadditive" relation (see previous section). More complicated possibilities involve 
characteristic frequency and amplitude modulations over the syllable. Whatever function we may 
discover, our hybrid data suggest that it will be talker-independent in form, and that it will not 
be the sum of tv»o exponentials sharing a common asymptote. 

The judgment models also raise questions about the role of more local sources of information 
for vowel identity. The contingent model, for example, considered the possibility that different 
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regions of a syllable provide different kinds of information. While that particular model proved 
uninformative, there was some evidence in listeners' errors on the isolated portions that the early 
and late regions of a syllable carry some information about vowel properties. Listeners in both 
conditions showed better-than-chance performance (chance would be 91% errors). Also, as we 
noted above, the initial portions of the monophthongs carried sufficient information to support 
four-way classification (high-low, front-back) with only 26% errors. A similar analysis of errors 
on final portions shows 33% errors for the four-way classification. 

These results on the isolated portions raise a second challenge for future research: to identify 
the carriers of information in these more local regions of a syllable. In the case of the initial 
portions, one candidate is the release burst of initial stop consonants. In fact, several studies have 
reported that this brief initial phase of a syllable is sufficient for better-than-chance discrimination 
within small sets of vowels (Blumstein & Stevens, 1980; Winitz., Scheo, & Reeds, 1972). The 
acoustic basis for these effects is still not clear, nor is it clear how well listeners could do on 
a larger, more representative set of vowels. Even so, these findings provide a good example of 
a general principle we seek to develop in this paper: The transient regions of a syllable may 
provide information that is specific to a vowel without necessarily being information about a 
target state. A rough analogy can be drawn to the role of onset transients in the identification of 
musical instruments. The dynamic structure of these transients carries more information about 
instrument identity than does the steady-state region of a sustained tone (Grey & Gordon, 1978; 
Luce & Clark, 1967; Saldanha & Corso, 1964). More to the point, the transients do not simply 
aid the extraction of Steady-state timbre; they provide information that is different in kind. In 
the case of vowels, we expect to find a similar pattern: namely, that the structure of a talker's 
onset transients is both specific to the vowel and distinct fron* spectral targets. 

EXPERIMENT 2: SOURCE PERCEPTION 

After the completion of each session of testing in Experiment 1, we informally interviewed 
subjects about their impressions of the edited syllables and were surprised to discover that subjects 
in the hybrid condition rarely heard a complete change of source. Instead, they heard a single 
talker, typically a male, and, moie particularly, a male prone to abrupt pitch changes. These 
reports were surprising because the hybrid stimuli contain marked discontinuities of fundamental 
and formant frequencies, and these would normally be expected to specify a change of articulatory 
source. The perceptual reports suggest that the hybrid syllables contain other types of acoustic 
information, which strongly specify a single production by a single source. In the normal course 
of events, this acoustic structure would parallel other information about the source, such as 
fundamental frequency and formant frequency contours. However, in the unique case of the 
hybrid stimuli, it opposes these other sources of information and appears to predominate over 
them. Since this speculation has implications for the study of source perception, we thought it 
important to make a more rigorous test of the findings that prompted it. In Experiment 2, we 
directly sought subjects' judgments of the number of talkers they heard when listening to hybrid 
silent-center (and silent-center) stimuli. 



59 



Evidence of Talker-Independent Information 



53 



Method 

Subjects 

Nine undergraduate students were the subjects of this experiment. They were native speakers 
of English with normal hearing. They had no contact with the subjects of Experiment 1 and were 
not themselves subjects of that experiment. 

Stimuli 

The stimuli of this experiment were the silent-center and hybrid-silent-center stimuli de- 
scribed in Experiment 1. 

Procedure 

Ten randomized repetitions of the 22 silent-conter stimuli, spaced at four-second intervals, 
comprised a silent-center test block. A comparable arrangement of the 22 hybrid stimuli comprised 
a hybrid test block. Each test block was presented to subjects twice, in alternation. Five subjects 
began with the silent-center block, four began with the hybrid block. The subjects' task was to 
determine which of the following three alternatives best described the source(s) J the stimuli 
heard on each trial: (1) One talker speaking with normal intonation; (2) One talker speaking 
with a pitch change; (3) Two talkers speaking. Responses were reported by checking off the 
appropriate alternative on an answer sheet. 

Prior to testing, subjects completed a practice block in which they responded to one presen- 
tation of each hybrid and silent-center stimulus. The order of these presentations was randomized. 
The subjects received no feedback regarding the accuracy of their responses. The practice block 
was followed by a pause for questions regarding procedure, and then by the first test block. There 
was a five-minute break between the test blocks. All testing was completed in a single session. 

Results and Discussion 

The results of this experiment are summarized in Figure 7, which shows the proportion of 
silent-center and hybrid responses in each category (collapsed across the two orders of presen- 
tation). The results confirm the informal reports given by subjects in the hybrid condition of 
Experiment 1: Hybrid stimuli are most generally perceived to have been produced by a single 
talker. They were so perceived on a total of 75% of the trials in the present experiment. That 
percentage was only slightly smaller than the total percentage of single-talker responses for silent- 
center syllables (82%). The principal difference between hybrid and silent-center responses was in 
their distribution over the two single-talker categories. With silent-center stimuli, subjects more 
often judged that the talker spoke with normal intonation (57% of all judgments), while with 
hybrid stimuli, subjects more often heard a pitch change (43% of all judgments). 

Listeners' judgments that the hybrid stimuli derived from a single source may have been 
facilitated by the presence of the silent gap between the initial and final portions spoken by the 
different talkers. The stimuli did not contain instantaneous changes in fundamental frequency and 
formant contours. Instead, those contours were heard to be interrupted at one point and resumed 
at another. Perhaps in such cases it is reasonable for listeners to ascribe the gap's "bridge" to the 
rather curious behavior of a single talker. If so, we would note that there is a stong asymmetry 
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Figure 7. Proportion of trials on which subjects judged regular (single-talker)and hybrid silent-center stimuli to 
have been produced by: (1) one talker speaking with a normal pitch; (2) one talker speaking with an abrupt pitch 
change; or (3) two talkers. 



in those ascriptions. With both woman/man and man/woman hybrids, listeners nearly always 
reported that the single talker was a man. For some reason his vocal characteristics predominated. 

We would also note that few perceptual gaps can be "bridged" so readily as the hvbriu to ap. 
While a pitch break of the magnitude seen across the initial and final portions is conceivable for a 
single talker, a formant pattern break of the magnitude seen (15-20%) is inconceivable. (It would 
require a change in the talker's age or sex in mid-utterance.) Listeners integrated the syllable 
portions in spite of this radical change in effective vocal tract dimensions, and this suggests 
that other, more powerful informatu n for source continuity was present in the acoustic signal. It 
seems likely that listeners were strongly aided in bridging the silent gap by the common style with 
which the two talkers produced the original syllables. The two talkers spoke the same dialect 
and produced the same vowel gestures, in the same phonetic context, una «r the same timing 
regimen (matching the beats of a metronome). The close similarity of their articulatory styles 
would produce, as a natural consequence, a close similarity of acoustic "styles of change" in their 
productions. These dynamic consequences of "producing the same vowel with the same timing" 
may be the basis for subjects' integrating the two portions perceptually and hearing th~m as the 
product of a common source. Given the composition of the hybrid syllables, we can conclude 
that this acoustic information is defined over the syllable as a whole, and, in particular, that k r 
defined sufficiently by a relation between the initial and final regions of the syllable. 



CONCLUSION 



Experiments 1 and 2 provide strong indications that the perception of vowel identity and 
source continuity Is sensitive to dynamic acoustic structure defined over the course of a whole 
syllable. The acoustic information appears to be distinct in type from sue variables as syllable 
duration and spectral targets (whether realized in the signal or extrapolated). Vowel perception 
and source perception can be remarkably impervious to discontinuities in local spectrum, if speech 
materials are otherwise matched in timing and articulatory style. This strongly suggests that 
a dialect's vowels can be characterized by higher-order variables (patterns of articulatory and 
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spectral change) that are independent of a specific talker's vocal tract dimensions. A more 
precise definition of these variables will aid our understanding of the acoustic basis for identifying 
a vowel and, not coincidentally, for perceiving an articulation as continuous. 
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CONTROLLED VARIABLES IN SENTENCE INTONATION* 



Carole E. Ge!fer,f Katherine S. Harris,1 and Thomas Baer 



INTRODUCTION 

In describing the acoustic characteristics of sentence intonation, the terms downdrifi and 
declination have been applied to the behavior of both the rapid variations in fundamental fre- 
quency (F 0 ) corresponding to syllable prominences whose peaks comprise the envelope of an Fo 
contour (see, for example, Cooper & Sorenson, 1981), and the slower variation in Fo that defines 
a reference level upon which these local prominences are superimposed (see, for example, Cohen, 
Collier, & y t Hart, 1982). Recently, there has been considerable interest in the mental represen- 
tation of various aspects of declination (Breckenridge, 1977; Cooper & Sorenson, 1981; Liberman 
& Pierrehumbert, 1982; Pierrehumbert, 1979) and, by extension, the control or regulation of the 
physiological variables involved in its realization (Atkinson, 1973; Collier, 1975; Gelfer, Harris, 
Collier, & Baer, 1985; Maeda, 1976). l/nfortunately. cognitive processes are not reidily observ- 
able. However, to the ext .it that they are expected to have some physical reality, examining the 
patterns of control of tiie physiological processes that uUimately bear on the acoustic aspects of 
sentence intonation should provide uome insight into the psychological reality of declination. 

In the first part of this paper, we will examine the behavior of subglottal pressure (P,) 
during speech in order to determine whether the time course of the drop in subglottal p/essure 
associated with declination is a controlled variable in sentence intonation, or, alter natively, the 
passive consequence of lung deflation. Obviously, the rate at which air is used in producing 
speech depends on the phonetic characteristics of utterances (Klatt, Stevens, & Mead, 1968). For 
example, because of the reduced airflow resistance at the glottis and the configuration of the vocal 
tract for a voiceless fricative, substantially higher air ow rates occur fcr utterances containing the 
syllable /fa/ than for tnose containing syllables composed of voiced continuants, such as /ma/. 
If the lungs were allowed to deflate passively, we would expect subglottal pressure to decline at 
different rates over the course of these syllables. However, there is evidence indicating that lung 
deflation during speech is not a purely passive phenomenon. For example, Draper, Ladefoged, 
and Whitteridge (1960) and Mead, Bouhuys, and Proctor (1968) found subglottal pressure to 
stable throughout sustained voice production, thus suggesting that the muscles of the respirator} 
system are marshalled in such a way as to maintain P*. However, these studies have examined 
only sustained phonations of constant amplitudes that also require constant pressure:. On the 

In T. Baer, C. Sasaki, & K. Harris (Edf.), Laryngeal function in phonation and respiration 
(pp. 422-435). Boston: College-Hill Press, 1987, 
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other hand, subglottal pressures during speech are known to vary dynamically. What we do not 
know, then, is whether the variation in pressure over time is the natural by-product of unchecked 
expiratory forces, or whether it reflects ongoing control of the respiratory musculature in order 
to produce dynamically stable pressures. By using reiterant speech (Kelso, V-Bateson, Saltzman, 
& Kay, 1985; Larkey, 1983) in which a sentence is mimicked with a high flow syllable, "fa," or 
a low flow syllable, "ma," we can discover whether the time course of pressure variation is the 
by-product of 1 *- Aecked expiratory forces, or whether it is dynamically stable. Moreover, to the 
extent tuat F 0 mirrors P 5 , we can perhaps gain insight into the factors responsible for declination 
itself. 

In the second part of this paper, we will address the phenomenon known as Fo resetting. 
It has been suggested that the declination function is sensitive to the syntactic structure 'f 
an utterance. Thus, in a two-clause utterance, the Fo contour may be discontinuous at the 
major syntactic boundary so that a single falling contour no longer characterizes the declination 
function (Cooper & Sorcnson, 1981; Fujisaki & Hirose, 1982; Maeda, 1976). However, there is 
some question as to which aspect of the Fo trajectory actually defines resetting in these instances. 
For example, Fujisaki and his colleagues (Fujisaki & Hirose, 1982; Fujisaki, Hirose, & Ohta, 1979) 
have developed a model of intonation that allows for two basic inputs, the phrase level and accent 
level commands, which are realized as the Voicing' (baseline) and 'accent' (syllabic) components, 
respectively. According to this model, it is the voicing component that may be reset at clause 
boundaries, while the accent components vary independently of the baseline, and, therefore, 
independently of syntactic structure. 

Cooper and Sorenson (1981) suggest, too, that declination is reset at clause boundaries in a 
way that is relevant to the syntactic struct i .e of an utterance. However, in contrast to Fujisaki, 
they measure declination, and thus gauge resetting, on the basis of the relationship of syllable 
peaks; specifically, the height of the first peak in a second clause to that of a sentence-initial peak. 
Furthermore, they suggest that the resetting of peak Fo directly mirrors a speaker's intention to 
signal the syntactic structure of the sentence, and that resetting is planned in some detail at 
the outset of an utterance. While we recognize that there is an interaction between syntax and 
the realization of sentence intonation, we hypothesize that the extent to which F 0 is reset is not 
planned prior to the execution of an utterance even if the presence or absence of resetting may 
be planned. Fujisaki has suggested that resetting is triggered when a significant pause occurs at 
the clause boundary. Taking this notion a step further, we would suggest instead that it is not 
only the pause but also the new inspiration that may accompany it that in turn influences F 0 
indirectly through the resetting of such variables as subglottal pressure and/or laryngeal muscle 
activity. Thus, we hypothesize that F 0 resetting will depend on the presence or absence of a pause 
and inspiration at clause boundaries. 

METHODS 

Two speakers served as subjects for the first part of this study, and one of the two served as a 
subject for the second. Both are native speakers of Dutch, fluent in English, and both were aware 
of at least some of the purposes of this work. They were chosen as s Sjects primarily because of 
their willingness, and ability, to tolerate the invasive procedures required. 

Lung volume was inferred from the calibrated sum of thoracic and abdominal signals from a 
Respitrace inductive plethysinograph, and airflow rate (cc/sec) was derived from calculations of 
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volume over time. Subglottal pressure was recorded directly, but differently, for the two subjects, 
RC and LB. For RC, a pressure transducer (Setra Systems 236L) was coupled to Hie subglottal 
space by means of a cannula inserted percutaneously through the cricothyroid membrane. For 
LB, a miniature pressure transducer (Millar SPC-350) was introduced pernasally through the 
posterior glottis into the trachea. While the percutaneous approach is certainly the more invasive 
procedure, it provides a signal that is easier to calibrate, because the miniature transducer cannot 
be calibrated outside the body, and it is highly sensitive to changes in temperature that occur 
within the trachea upon inspiration (Cranen & Boves, 1985). Unfortunately, we did not recognize 
these difficulties at the time of recording, so that the pressure signal could not be calibrated 
properly. However, while absolute values for the pressure data for the subject using this device are 
uninterpretable, the relative pressure levels should be vahd, since temperature changes affect the 
zero offset but not the sensitivity of the transducer. For both subjects, EMG techniques previously 
described (Harris, ly6i) were u*ed to record fron> the cricothyroid muscle. Fundamental frequency 
was derived from the output ot an accelerometer (Stevens, Kalikow, & Willemain, 1975) attached 
to the pretracheal skin surface. For LB, a cepstral technique was used to extract F 0 from the 
signal. For RC, the accelerometer output was sampled using a Visipitch period-by-period F 0 
extractor. This latter procedure is equal in accuracy to the former F 0 extraction technique, but 
has the advantage of on-line sampling at one-half real time. However, it became available to us 
only after the data for the first subject had been analyzed. 

Stimuli 

In the first experiment, the two subjects produced reiterant forms of Dutch utterances, using 
the syllables /ma/ and /fa/ (Appendix A). These utterances were also produced in three lengths, 
with three different emphatic stress configurations (early, double, and late). Thus, there were 
nine utterance types per reiterant condition (i.e., /ma/ or /fa/). However, the stress and length 
conditions will not be discussed separately here, except to be noted in the examples shown, 
because the differences among them have been discussed previously (Gelfer et al., 1985). 

In the second experiment, one of the subjects, RC, produced three similar English sentences. 
For two of the sentences, the syntactic boundary was moved in order to alter slightly the length 
of each clause. The third sentence conjoined two clauses similar to those comprising the first two 
sentences (Appendix B). The subject's task was to produce each sentence under two conditions: 
no pau.se and no inspiration at the clause boundary, and both a pause and an inspiration at the 
clause boundary. 

RESULTS: EXPERIMENT 1 

Averaged subglottal pressure, lung volume, and the amplitude envelope for utterances of 
Length 2 with various emphatic stress configurations are shown for both subjects in Figui - 1. 
It is apparent from this figure that, for the subglottal pressure, there is little difference between 
the /ma/ and /fa/ utterances apart from the presence of local perturbations in the curve of the 
/fa/ utterances. The acoustic amplitude envelopes of the two reiterant utterance types sh'»w 
no substantial difference in overall acoustic amplitude, and, as would be expected, resemble the 
subglottal pressure contours in overall shape. However, despite the uniformity of the pressure 
curves, the lung volume curves for the two utterances show the change in volume over time to be 
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Figure 1. Averaged subglottai pressure (panel 1), Respitrace (panel 2), and amplitude envelope (panel 3) curves 
for comparable /ma/ and /fa/ utterances for subjects RC (top) and ~ B (bottom). The vertical line in each panel 
denotes the line-up point used for averaging the tokens of each utterance type, which in these utterances is the 
onset of the vowel for the first syllable receiving lexical stress. The solid curves represent the reiterant /ma/ 
utterances, and the dashed curves the reiterant /fa/ utterances. The maximum and minimum values for pressure 
on the y axis are 13 cm H2O are 0 cm H2O for RC, 9 cm H2O and -6 cm H2O for LB. For respiratory valence, 
values range from 5 liters to 2 liters for RC, and from 5 liters for 1 liter for LB. The audio amplitude is in arbitrary 
units. 

greater for the /fa/ utterances, as is evidenced bv the steeper slopes. Thus, for both subjects, we 
observe no apparent relationship between airflow rate and the P, contours. 

In order to quantify these data, we plotted the distributions of subglottal pressures and 
airflow rates for the two utterance types. For subglottal pressure, we measured average levels 
over a fixed time interval, rather than differences over time, in order to neutralize any segmental 
effects. Since our earlier work demonstrated that effects of such variables as sentence length are 
reflected in initial peak pressure values (Gelfer et ah, 1985), we were careful to eliminate these 
portions of the curves from the measured interval. By calculating the averages over an interval of 
600 ms, from 400 to 1000 ms, after the occurrence of the first lexically stressed syllable, we were 
able to avoid averaging values under these peaks, at the same time being able to include data 
from some of the shortest utterances. 

The same interval was used to calculate the change in lung volume over time. However, 
because the Respitrace curves are rather smooth and not prone to perturbation clue to segmental 
effects, we calculated the difference in volume between the two points in order to derive the rate 
of decline (i.e., airflow rate). 

on 
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The distributions of P, measures for all tokens of the /ma/ and /fa/ utterances are shown in 
Figure 2. The difference between the means of these distributions is statistically nonsignificant: 
p > .2 for RC; p > .5 for LB. By contrast, the difference in airflow rate for the /ma/ and /fa/ 
utterances (Figure 3) is statistically significant for both subjects, p < .001. Thus, P, appears to 
remain stable despite the significant differences in airflow secondary to the phonetic structure of 
these utterances. 

RESULTS: EXPERIMENT 2 

In this experiment, Subject (RC) produced three two-clause utterances under conditions 
where pausing and inspiration were directly manipulated. In the first condition, he produced 
each repetition of each utterance with neither a pause nor inspiration at the clause boundary. In 
the second condition, all tokens were produced with both a pause and inspiration at the clause 
boundary. 

Figure 4 shows the averaged Respitrace and P, curves for one sentence across the two con- 
ditions being considered here (i.e., -pause/-inspiration and +pause/+inspiration). This general 
picture is identical across sentence types, so we will present graphic displays only for one sentence. 

In the absence of both a pause and inspiration at the clause boundary in the first condition 
(Panel 1), there is a continuous, although choppy, subglottal pressure curve throughout both 
clauses and across the intervening boundary as well. On the other hand, where both a pause 
and inspiration occur (Panel 2), there is a concomitant drop in the subglottal pressure during the 
inspiration, which then increases significantly as expiration resumes. 

Despite the differences in pause durations and respiratory activity, the subject produced the 
same general F 0 contours across conditions (Figure 5). For our analyses, F 0 values were measured 
for the first peak in the first clause (peak 1A), the last peak in the first clause (peak IB), and the 
first peak in the second clause (peak 2A, or the five tokens of each of the three sentences under 
each condition. 

Figure 6 is a schematic respresentation of the average values, collapsed across sentence type, 
for each condition. It can be seen that, while the F 0 values for the two peaks (1A and IB) in 
the first clause are strikingly similar across conditions, the value of the first peak in the second 
clause (2A) varies systematically as a function of the pausing/breathing condition at the clause 
boundary. That is, where there is no pause or inspiration, F 0 falls 8 Hz below those peaks that 
were preceded by an inspiration (Table 1). This difference is statistically significant as well, 
p<.001. 

A comparison of P 3 values at peak 2A yields corresponding results. That is, subglottal 
pressuie is significantly higher when a pause and inspiration oa , than when they do not, p << 
.001. Moreover, when the ratio of frequency change per centimeter of water is calculated foi 
peak 2A between conditions 1 and 2, these ratios fall within the accepted range of 3-7Hz/Cm- 
H 2 0 (Baer, 1979; Hixon, Klatt, & Mead, 1971; Ladefoged, 1963), suggesting that the relationship 
between the increase in P, and that in F 0 could be more than a correlational one. However, before 
the behavior of F 0 is attributed to the presence or absence of an increase in P a , the contribution 
of laryngeal muscle activity must be determined. 

Figure 7 shows the cricothyroid muscle ac*iv»*y fo r *h* *wn renditions for the same sentence. 
It appears that there is no systematic resetting of CT activity as a function of inspiration at 
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Figure 2. Distribution of P, averages for tokens of all /ma/ and /fa/ ui*erRnces for both subjects. The solid 
bars denote the /ma/ tokens, and the dwhed bars the /fa/ tokens. 



j 




- 


B B B 1 


Iju 


- 



■ * 

0 to 



V< «W J> ■ JVi • JV- >*/ IV w 




(M *co tv> 4V. 400 MO- W- J00 »v 



Figurp 3. Distribution of airflow rates (cc/sec) for tokens of all /ma/ and /fa/ utterances for both subjects. The 
solid bars denote the /ma/ tokens, and the dashed bars the /fa/ tokens. 
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Figure 4. Averaged subglottal pressure and Respitrace curves for a representative sentence across conditions. 
The first panel represents the no pause, no inspiration condition, and the second panel represents the pause plus 
inspiration condition. The line-up point, depicted by the vertical line, represents the onset of voicing for the vowel 
in the word 'plan' in the second clause. The same line-up point was used for all three sentence types. 



200M2 



FUNDAMENTAL FREQUENCY 



-1 0 1 

SECONDS 




When the lawyer called Reynolds,. 



NO PAUSE. NO INSPIRATION 



PAUSE, INSPIRATION 



Figure 5. Average, I ?q contours for a representative sentence across conditions. The tnst panel represents the 
pause, no inspiration condition, and the second panel represents the pause plus inspiration condition The line-up 
point, depicted by the vertical line, represents the onset of voicing for the vowel in the word 'plan 1 in he second 
clause. The same line-up point was used for all three sentence types. 
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Figure 6. Schematic representation of the mean Fo values (peaks 1A, IB, 2A), collapsed across sentence types, 
for both conditions. The X's denote the no pause, no aspiration condition, and the triangles's the pause plus 
inspiration condition. 
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Figure 7. Averaged cricothyroid muscle activity for a representative sentence across conditions The first panel 
represents the no pause, no inspiration condition, and the second panel represents the pause plus inspiration 
condition. The line-up point, depicted by the vertical line, represents the onset of voicing for the vowel in the word 
'plan' in the second clause The same line-up point was used for all three sentence types. 
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Table 1 
Values at Peak 2A 



Fundamental Fi tiquency 



Subglottal Pressure 



Sentence 1 
Sentence 2 
Sentence 3 



— Pause/— Insp 

116 
117 
118 



+ Pause/ + Insp 

123 
122 
130 



-Pause/— Insp 

7.8 
8.6 
9.1 



+ Pause/ + Inf 

9.9 
10 1 
'1.6 



Mean 



117 



125 



8.5 



10.5 



Condition 2 - Condition 1 



Ratio 



Sentence 1 
Sentence 2 
Sentence 3 



F 0 

7 
5 
12 



2.1 
1.5 
2.5 



Hz/Cm-H 2 0 

3.33 
3.33 
4.80 



Averaged Fo and P s values for peak 2 A for the three sentences and the ratios of Hz/Cm-H^O calculated between 
the no pause/no inspiration and the pause plus inspiration conditions. 



the clause boundary. In fact, there is more CT activity following the clause boundary in the 
first condition, where no inspiration occurs. It would thus appear that CT contributes little, 
if any, to Fo resetting in this case, and that the increase in P 3 following an inspiration could 
indeed account for the amount of resetting observed. The above results suggest that when * oth 
a pause and inspiration occur, there is a significant increase in P 9 and Fo values relative to those 
occurring when there is neither a pause nor an inspiration. However, in comparing only these two 
conditions, we are unable to separate the relative effects of breathing and pausing; on resetting. 

Our results differ somewhat from those of Collier (1987) who, in certain instances, found a 
greater amount of resetting. In addition, Collier fails to find the substantial effect of inspiration 
on P 9 that we do. We believe that these differences may be attributed to differences in the tasks 
in the two studies. That is, while Collier manipulates the stress ion fie" rat ion (i.e., lo-lo; hi-lii) 
around the clause boundary, we :lo not. Thus, the intentional leahzation of .specific intonation 
contours might result, for example, in greater involvement in CT activity while, at the same tnne f 
reducing P, activity. 
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Discussion 

It has been known for some time that the respiratory system acts in such a way as to 
stabilize subgloital pressure (eg., Draper et aL, 1960; Mead et al., 1968). The data presented 
here not only confirm the results of these earlier studies, but provide evidence that this control is 
dynamic in nature. Furthermore, this stability is maintained even when the system must respond 
to perturbations in the form of varying airflow requirements. In other words, if lung deflation 
were passive in nature, pressure would certainly decline more rapidly for utterances where greater 
airflow rates are used. However, we have found the rate of pressure decline to be independent of 
the rate of airflow. 

Previous studies in which simultaneous measures of subglottal pressure and fundamental 
frequency have been recorded during sentence production have noted that, through the most 
stable portions of these curves, their decline is relativel} parallel (see, for example, Atkinson, 
1973; Collier, 1975; Lieberman, 1967), although a direct cause and effect relationship has been 
difficult to establish. However, Gelfer et al. (1985) were able to demonstrate that, in th~ absence of 
cricothyroid activity, the fall in pressure accounted for an appropriate fall in frequency. Moreover, 
the rate of both P, and Fo decline was found to be stable across varying utterance lengths. The 
data presented here suggest that P, is a controlled variable in sentence production, and that Fo 
declination is a consequence. 

Similarly, the resetting of F 0 at a clause boundary appears to represent the effect of a general 
resetting of the respiratory system on subglottal pressure following an inspiration. That is, we 
found Fo to be significantly higher when an inspiration occurred at the clause boundary than 
when it did not. At the same time, however, it is difficult to make the claim that the resulting 
difference of 8 Hz is perceptually salient, for it is also the case that the syntactic structure can 
be easily recovered when listening to any token of any of these utterances. It is not entirely 
clear, then, that peak Fo resetting is a necessary mechanism for encoding syntactic structure 
on the part of the speaker, or a prerequisite for decoding syntax on the part of the listener. 
Furthermore, that the extent of Fo resetting is planned by a speaker, in that it has a place in the 
mental representation of an utterance, seems untenable. Rather, resetting would appear to be 
the outcome of an optional speaker strategy — perhaps, for example, whether a speaker chooses 
to pause or take a new breath, and thus "reset" the whole system, prior to the execution of a 
second clause — and that this is the level at which it is controlled. 
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APPENDIX A 

Early Stress: 

Length 1: Je weet dat jan nadenkt. 

Length 2: Je weet dat jan erover nadenkt te betalen. 

Length 3: Je weet dat jan erover nadenkt ons darvoor met genoegen te betalen. 



Double Stress: 

Length 1: Je weet dat jan nadenkt. 

Length 2: Je weet dat jan erover nadenkt te betalen. 

Length 3: Je weet dat jan erover nadenkt ons daarvoor met genoegen te betalen. 



Late Stress: 

Length 1: Je weet dat jan nadenkt. 

Length 2: Je weet dat jan erover nadenkt te betalen. 

Length 3: Je weet dat jan erover nadenkt ons daarvoor met genoegen te betalen. 



APPENDIX B 

Sentence 1: When the lawyer called Reynolds, the plans were di&cussed. 
Sentence 2: When the lawyer called, Reynolds' plans were discussed. 
Sentence 3: The lawyer called Reynolds, and the plans weie discussed. 



ARTICULATORY SYNTHESIS: NUMERICAL SOLUTION OF A 
HYPERBOLIC DIFFERENTIAL EQUATION 
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Abstract. The computation of acoustic pressure fluctuations in a 
variable area tube is often done using the Kelly -Lochbaum reflection 
model. The numerical scheme derived from this model can be put 
into the context of finite-difference approximations to a differential 
equation describing acoustic wave propagation (a hyperbolic differen- 
tial equation).. Quantitative criteria for goodness of finite- difference 
schemes (truncation error, stability, and dispersion) are discussed 
without considering the effect of boundary conditions. An alterna- 
tive scheme that has better truncation error to the reflection model 
approximation is examined, but we do not necessarily recommend its 
adoption. The quantitative criteria should be applied to the full initial- 
boundary value problem inherent in articulatory synthesis when a nu- 
merical scheme is being chosen., 

INTRODUCTION 

In this note one aspect of articulatory synthesis will be considered — that of solving the differ- 
ential equation describing acoustic (small amplitude), one-dimensional propagation of a pressure 
disturbance through a lossless tube with spatially varying cross section. This equation can be 
written: 




where Y(x) - A 0 /poC acoustic admittance, Aq{x) - cross-sectional area of the tube when no 
disturbances are present, p Q = density of air with no disturbances, c = adiabatic speed of sound 
in air, p = acoustic perturbation pressure, / = time, and x = distance along the tube axis 
(Light hill, 1978, pp. 124-125). This equation (Webster horn equation) will be known as the 
differential equation for the mainderof this note. This equation belongs to the class of hyperbolic 
differential equations the meaning and consequences of which will be discussed in the rest of this 
note. 

In current articulatory synthesis, both time domain and frequency domain, the Kelly 
Lochbaum reflection model provides a popular uiethod foi the computation of bound propa- 
gation in a tube (Liljenciants, 1985; Rubm, B aer. k~ Mei melst ein . 1 0<S 1 ). This met hod can be 

1 ' wwledgment . Preparation of this niani.^ript was suppoited bv Giant NS 13617 to Hawkins 
.laboratories. The author thanks Philip Rubin foi helpful comments. 
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seen to be a finite-difference approximation to the differential equation. In this context there 
are quantitative measures for goodness of approximation to the solution of the differential equa- 
tion. Three of these will be discussed here: stability, truncation error, and dispersion relations. 
Roughly, a stable method is one for which the solution remains bounded in a finite time span 
as the discrete time interval goes to zero. Truncation error tells us how much better we would 
do in approximating the differential equation if we were to reduce the discrete time and discrete 
spatial intervals of the finite-difference equation. In other words, it says how well the solution 
to the differential equation solves the finite-difference approximation. Dispersion relations, or 
the relationship between frequency and wavenumber, can be derived for solutions to both the 
differential equation and the finite-difference approximation. These relationships should express 
the same relation to a close approximation because the ratio of frequency to wavenumber gives 
the phase speed (Trefethen, 1982). Dispersion error has been considered previously in the speech 
literature, where it is sometimes called frequency warping (Maeda, 1982; Portnoff, 1973). 

Considerations of truncation error will allow us to propose another finite-difference approxi- 
mation, which is a slight modification to that provided by the Kelly-Lochbaum reflection model. 
Then we will consider the stability and dispersion relations of both approximations. Because the 
boundary conditions inherent in the articulatory synthesis problem are not considered in the anal- 
ysis here, we cannot recommend one method over the other. The alternative method illustrates 
the possibility of deriving other efficient finite-difference schemes with, perhaps, better numerical 
properties. 

We will be applying the von Neumann stability condition to the finite-difference methods 
in this note. This condition does not provide a sufficient condition for the full initial-bounc? ry 
value problem of articulatory synthesis. Under special circumstances it does provide necessary 
and sufficient conditions for pure initial value problem: with constant coefficient difference equa- 
tions (Richtmeyer & Morton, 1967, pp, 68-72). However, the von Neumann condition, applied 
locally, will be a necessary condition for strong stability in the full initial-boundary value problem 
(Richtmeyer ic Morton, 1967, p. 99, 132). The von Neumann condition will be stated below when 
it is invoked. 



STRUCTURE OF THE DIFFERENTIAL EQUATION 

First, we will explore the structure of the differential equation with a few transformations, 
which will help illustrate the meaning of the phrase: hyperbolic differential equation. The second 
order differential equation can be written as a system of two first order differential equations: 

at dx 



0£ 
dt 



ydp 

-cl — 
ox 



(3) 



where J is the perturbation volume velocity in the small amplitude limit ( Light hill, 1978, pp. 124- 
125). In matrix notation: 

I— U + cA— U = 0 (4) 



dt 



dx 



7G 
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where: 



1 = 



(o l) ' A " {y 0 ) 



Next, we will perform a couple of similarity transformations on the (/>, J) space that will help 
to simplify the form of equation (4). The (p y J) space is transformed by a stretching transforma- 
tion, B, and then a rotation, G. Because the dependent variables, p and 7, are transformed, the 
differential equation (4) must also be transformed. In particular, the coefficient matrix A will be 
transformed to a matrix in diagonal form. Let: 



V = GBU, (5) 

where 



B = 



y/Y 0 

0 Vy~- 



Thus: 



As a result A the transformations on the (p,J) space, the system (4) is transformed into (see 
appendix): 

i l v+cH £ v=cKv ' < 7 > 

where 

/ 0 -1/2^ 

y l/2 diojzi* 1 0 

H= (J -l) =GBA(GB)" 1 . 

By the form of the relation between them, A and H are similar matrices. We have diagonalized 
the coefficient matrix that is multiplying the spatial derivative (i.e., transformed A to H), while 
leaving the identity matrix as the coefficient matrix of the time derivative. Because H has real 
eigenvalues, the system (7) is hyperbolic. Because A and H are similar, A has the same two 
real eigenvalues, and the system (4) is hyperbolic, and the original differential equation (1) is a 
hyperbolic differential equation. 

The implications for a system having the property of being hyperbolic are best illustrated by 
considering the system (7). By a change of the independent variables in (7), we can make further 
simplifications. Let: 

( = t + x/c, Z = t-x/c. (8) 
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In terms of these variables, system (7) becomes: 



dv + _ c dlogjY) _ 

a< ~ ~4 dx v 

dv~ cdlog(Y) + 



(9) 



Also, equation (8) can be expressed as a set of diffeiential equations: 

^ _ i / ^ x __ i dx 



(10) 



The set of equations (9) and (10) constitutes the canonical system of the original differential 
system (4) (Forsythe & Wasow, 1960, p. 43). One way to solve the second-order hyberbolic 
system in two independent variables is by integrating the system (9) simultaneaously along the 
characteristic lines, ( = constant and £ = constant, given by (10), Because each component 
equation in system (9) involves derivatives of the dependent variable along one characteristic line 
only, they may be treated as coupled ordinary differential equations for the sake of computation. 

In Figure 1, the geometry of the situation in the (x/c y t) plane is illustrated. For articulatory 
synthesis, the inflow boundary conditions are normally specified for x/c = 0, and impedance 
Ik ndary conditions at x/c = l/c. An initial condition should also be specified at t = 0. This 
leads to a well-posed initial-boundary value problem (Higdon, 1986). The discussion of boundary 
conditions will be postponed until a later note, and only the pure initial value problem will be 
considered here. 



(n+1)h/c 



(n + 1/2)h/c- 



nh/c 




~l 1 1 

jh/c (j+1/2)h/c (j+;)h/c 



x/c 



Figure 1. Characteristic lines. 
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The meanings of the superscripts and on the dependent variables v+ and v~ in 
equation (6) will now be explained. First, express volume velocity, J, and pressure, p, in terms 
of volume velocities in the positive and negative x-direction. 

j = p = y~ 1 (j+ + (ii) 

It can be seen that the new dependent variables pre just scaled versions of the positive-going vol- 
ume velocity and the negative-going volume velocity. The scaling depends upon spatial position. 

r + = V2Y-*J+ y v~ = y/2Y- l J~ (12) 



Note that in the case of constant area and no boundaries there is a particularly simple 
solution to (9) and (10) — that of two waves travelling at speed c in opposite directions, without 
change of form. More generally, if the logarithmic derivative of the area with respect to x is 
small, then energy along characteristic lines is approximately constant. This is seen by noting 
that the right-hand sides of (9) are approximately zero, and that the intensities of the positive and 
negative-going waves are Y~ l (J+ f and Y~\J-f respectively (see Iighthill, 1978, pp. 120-123). 

THE REFLECTION MODEL AS A FINITE-DIFFERENCE SCHEME 

The Kelly-Lochbaum reflection model can be seen to provide a finite-difference approximation 
to the system (8) if the following approximations are made. We will make reference to Figure 1, 
and let the step sizes be defined: 



A( = A£ = 2/i/c, Ax/c = At = /i/c, 



(13) 



where h > 0. 



We will be assuming that the dependent variables and the admittance function all have 
continuous third derivatives. This smoothness condition allows us to make use of Taylor's formula 
with remainder to estimate truncation error in the approximations. Normally, truncation error 
is written in terms of powers of the step size. For example, /(x,/i) is said to approximate g(x) to 
0(h N )\{: 

9{x) = f(xM + ?(z>H 



/here: 



We normally write: 



Inn — — 



S(*) = /(^A) + 0(h 



N \ 



However, we will sometimes write the function q(x A h) explicitly to show how the error depends 
on the smoothness of certain other functions. 



79 



74 



Richard McGo 



The first derivatives in equation (9) are approximated: 



™ j + l/2) 


+<» + !) +<n) 

U 0 + U *0) 


i a 3 t> + 




(2fc/c) 


6 dC 3 


Q -(n+l/2) 
OV (j+l/2) 


-(n+l) -(n) 

v u) ""o+o 


i a 3 u- 




(2fc/c) 


6 d£ 3 



ay. 



(14) 



where t, ( ^ n) refers to u + (*,x) at 1 = n/i/c and z/c = jh/c, and (j + n)h/c < < (j + n + 2)/i/c 
and (n — j — l)h/c < £* < (n — j + l)/i/c. The derivatives are approximated by centered 
differences along the characteristic lines. The logarithmic derivative of the admittance must also 
be approximated. 



dx 



dY 



0 + 1/2) 



r 0+i/2) 



dx 



2/h (y (J+1) -y (j) )- 


1/3 


d 3 Y 
<£x 3 


(V2) 2 


( y (j+D + y {j)) - 


dx* 




{h/2Y 



2/%+D 
ft 



r 0) 



O+i) + ^0) 



+ 0(/i 2 ) 



(15) 



where jh < x*,x** < (j + l)h. The finite-difference approximation to this derivative is, to within 
a constant factor, the reflection coefficient, /^j.f i/2)j for a tube with a discontinuous change in 
area at x = (j + l/2)/i. Note, given the smoothness of Y(x), that; 



dY 



<0(h), 



where jh < x*** < (j + l)h. The values of the dependent variables at / ~ (n + l/2)/»/c, 
x/c = (j + l/2)h/c also need to be estimated: 



-Mn + 1/2) f(n) . f 

M ' — Jl -1- 

O+J/2) l O) r ^ 



- (n4 1/2) ( -(») ^ r 



(Hi) 



where - j - l)h/c < < (n - j ■+ 1 j/i/c and (// }- 7 )/i/r <" ('* - (7? f j { 2)/i/c, 



so 
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We write the resulting finite-difference approximation to (9), using relations (11) through 



r + (n+l) 



r O + U r+(n) 

^ 0) + 



'(j) 



' r-(n) 
*041) 



-MO+i/2) + 0(/i 3 ) 

/*O + l/2> + 0(/» 3 ) 



-J, 



-(n) 
0+1) 



+ 0(/i 3 ) 



^; n) +o(/.) 



(17) 



+ 0(/, 3 ) 



The square roots can be approximated: 



(18) 



r 0> 



r 0+l) 



= l-M( J+ i/2) + C(/ I 2 ) 



Finally, the finite-difference approximation can be written: 



^ ( ) n+1) = (i -mo +1 /2,)^7, -M (J+1/ 2)^; n; + w 



r+(n) 



(19) 



Neglecting the truncation errors, these relations are the same as those piovided by the Kelly- 
Lochbaum model (Markel & Gray, 1976, pp. 66-67). In the analysis presented here, it is necessary 
that ^y, and ~y be small in order for the truncation error to be small, that is, Y should 
be relatively smooth. 

Another analysis may be possible for a discontinuous admittance function. Work on matched 
asymptotic expansions has shown that the conditions of continuity of pressure and volume velocity 
used in the derivation of the Kelly-Lochbaum reflection model is valid to the first order in a 
compactness parameter, even at abrupt area changes (Lesser & Lewis, 1972). A compactness 
parameter would be the ratio of the width of the tube section to the wavelength, where the tube 
wid'h is assumed to be much smaller than the wavelength of sound. This may not be justified 
if the tube sections are so short that the cut-off modes can leak from one section to another 
(Thompson, 1984). 

Note that an 0(h 2 ) error is made in the approximation (18). This could be avoided simply 
by using and v~ as the dependent variables, rather than J + and J~ . 0(h) errors are made in 
equations (16) in the evalu. ion of ^ a "d nl *» c evaluation of v *]+*/l{ 2 ^- This c -or could 

+ % ) 

{j4 ',',)■ If these changes are made, the resulting finite-difference 



be improved to be 0(h 2 ) by taking averages. That is, approximate t^'V/i)*' ^(^V+iV* ' 4 



and v o+i/2) »y 2\ v 

approximation appears as: 

( V W ) 



(n+1) . -in) 
( J ) +V 



1 



(J) 



1 (/'(;-* l/2)/2) 2 



1 " (/'(j-U/2)/2) 2 



J rl 12) 



/'( ; + l/2) 
(/'( ;4 1/2)/ 2 ) 2 



H»0 



(20) 



In matrix notation, the approximation provided by the Kelly-Lochbaum model, equation (19), is: 



J, 



(j-f i) 

'<J> 



1 + /'(; + 1/2) /'( J + l/2) 

-/ x (;.f )/2) 1 ~ 1/2) 




(21) 



(j-fi) . 
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Stability and Dispersion Errors 

In the following, we would like to find whether the Euclidian norms of the solution vecto* s 
to equations are uniformally bounded in a finite time interval, locally in space, as the step size, 
h y approaches zero (Richtmeyer & Morton, 1967, pp. 68-73). This is a local stability property. 
Local stability is used since the matrices are functions of the spatial coordinates and without a 
complete specification of boundary conditions, we cannot talk about the difficult global stability 
problem. However, to have global stability in ;he strong sense as defined by Richtmeyer and 
Morton (1967, p. 99), it is necessary to have the stability defined above in the local sense. 

Operationally, the local stability can be determined in the following way. Take a Fourier 
transform of the dependent variables against the spatial coordinate. Let y = exp(ikh). For 
example: 

v+\ n) = I v+ {n) (k)exp(t(jh)k)dk = I v+ in) {k)y>dk. 

J — oo J — oo 

For each Fourier component, equations (20) and (21) become: 

/* +<n+1) \ = L ((l-(^ J+1/2) /2)^ -*,+«/« \ '* +(n) \ 

V*- (n+1 V i + (M( J+1/2 )/2) 2 V Mo+i/i) (i-(Mj + i/2)/2) 2 )yj V*- (n V' 

(22) 

and 

-mo + i/2) (l-/ 1 ( J+1 / 2 ))yJu- (n) ^ (23) 

Local stability depends on the norm of the matrix (am^ ification matrix) in (22) or (23), that is, 
it depends upon the spectral radius of t.\e matrix (i.e., the magnitude of its largest eigenvalue). 
The von Neumann condition for stability *s that the eigenvalues of an amplification matrix, A, 
must satisfy: 

|A| < I + 0(h), /i - 0. 

lhis condition is both sufficient and necessary only in the case the amplification matrix is normal 
(i.e., commutes with its adjoint), otherwise it is just necessary (Richtmeyer & Morton, 1967, 
pp. 68 - 73). After some algebra, we find eigenvalues for (22) and (23), respectively, satisfy: 

A 2 - 2(1 - ( / i 0+I/2) /2)'-)cos(A/ l )A + (1 + ( /i0+1/2) /2) 2 ) 2 = 0 (24) 

and 

A 2 -2(cos(A-/ I )-i// (;+1/2) sin(A-/i))A -f 1 ^0. (25) 
Since the admittance function is continuously differentiable. 

/'o-n/2, < 0(h). 

With this we see that both amplification matrices satisfy the von Neumann co .dition. 

In the case that Pi ) +\/2) ~ 0 for all > we would like a stronger stability, naively |A| ^ 1, 
because there are no solutions of the exact differential equation that grow when area is a constant. 
In this case both (24) and (25) reduce to: 

A 2 - 2 cob( k'h )} f U0. (26) 
» 82 
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and hence \a\ -= 1. (Not only are the systems (20) and (21) stable in this case, but they conserve 
the nit*£;;.;i.ude of the dependent variable. The matrices are normal and, in fact, are the identity 
matrix.) We are able to meet these stability conditions because our time and space iutervals, At 
and Ax, are related by Ate/ Ax = 1, which is a special case of the Courant condition: Ate/ Ax < 1 
(Mitchell, 1969). 

We i.Ov/ compare the dispersion relation of vaves that propagate according to the finite- 
difference schemes (20) and (21) to dispersion relation of the waves that are an exact solution to 
the differential equation. The exact solution we will use is that of propagation in an exponential 
tube: 

Y(x) = exp(ax)//? 0 c. 

The exact volume velocity wave traveling in the posi*' x direction with circular frequency, u;, 
is given as (Lighthill, 1978): 

J + = J+exp[iu;e-i((u;/c) 2 - (a/2) 2 ) 1/2 x + (a/2)x]. (27) 

In terms of the dependent variable v+, the solution is: 

v + = v + exp[tu;f - t((u;/c) 2 - (a/2) 2 ) 1 ' 2 *]. (28) 

The phase, 0, is the same for both (27) and (28): 

0 = u;<-((u;/c) 2 - (a/2) 2 ) 1/2 x. (29) 

ihe lispersion relation is a relationship between the time and spatial dependence of the phase 
function. More exactly, let: 

00 , dQ 

Then the dispersion relation is of the form: <7(u>,fc) = 0. The dispersion relation for the exact 
solutions is: 

(=)•-*•♦(§)'. (30 

The dispersion relation for tne finite-difference approximations can be derived by performing a 
Fourier transform ; n both space and time. Let: 

p-foo f-foo 



/ V + (w,k)y>z n du,dk, 

- oo J — oo 



where 

z - exp(iu>/i/c), y = ex\){ikh). 
Substituting into the finite-difference approximations (20) and (21): 



/V f \ 1 /(l-^o+i^^jr 1 ;- 1 -/«on/J)»'' \ (V 

\V -j l K/i, J+ ,/2)/2) 2 V mo+i/j)*" 1 (l -(/'o4i/2)/2) 2 y 



V 



) 



(31) 
(32) 
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The resulting systems of homogeneous linear equations must have determinant zero. Both 
systems satisfy: 

(!)• + „(*»). (33, 

where the neglected terms include the factors: fc 4 /i 2 , (u;/c) 4 /i 2 , a 4 /i 2 , a 3 Jfc/i 2 , and ak 3 h 2 . 
In order to keep dispersion errors small we must keep the spatial divisions small with respect to 
wavelength, and time divisions small with respect to wave period. Also, as before, the admittance 
function must be smooth: the rate of change of area with respect to x must not be too large. 
From the above, both finite-difference schemes, (20) and (21), are seen to provide, practically, the 
same approximation to the dispersion relation to t ie original differential equation. 

Conclusion 

The computational scheme resulting from the Kelly-Lochbaum reflection model has been put 
into the context of a finite-difference approximation to the differential equation. With a couple 
of minor changes, we were able to derive a finite-difference approximation with better truncation 
error properties, without giving anything up in terms of the von Neumann stability conditions and 
dispersion. We do not necessarily recommend this modified scheme for computational purposes, 
since the full initial-boundary value problem has not been considered. 

There are many numerical methods that can be considered. One such is the Lax-Wendroff 
schen*^, which has at least as good truncation error as the modified scheme presented here 
(Mitchell, 1969). Another method is integrating along characteristics in the manner of solving 
simultaneous ordinary differential equations, where predictor-correctors could be used (Thomas, 
1954). Portnoff (1973) considered an implicit scheme for solving the differential equation. Implicit 
schemes are attractive because stability does not depend on small time step sizes and the bound- 
ary conditions are easily incorporated. However, there is a trade-off in terms of computational 
ease, where implicit schemes involve at least one matrix inversion to update all spatial positions 
simultaneously. 

In this note, we took the starting point as a differential equation describing acoustic wave 
propagation which can be derived from conservation laws and under known approximations. A 
numerical method can be chosen for the solution of this differential equation, where bounds can 
be found on the error of the numerical approximation. We believe that carefully going from 
conservation laws to synthesis can help assess the physical model on which the synthesis is based. 
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Appendix 



To derne equation (7) from equation (4), left multiply equation (4) by GB: 



I— (GB)U + c(GB)A(GB)" , (GB) — U = 0. 



Using t he definitions of V and H, and using the product rule for differentiation: 




)(GB)" ! V - 0. 



Equation (7) results if: 



K - H( 



9GB 



)(GB) 
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TYPE AND NUMBER OF VIOLATIONS AND THE GRAMMATI- 
CAL CONGRUENCY EFFECT IN LEXICAL DECISION* 



G. Lukatela,** A. Kostic,** D. Todorovic,** C. Carello,f and M. T. TurveyJ 

Abstract. An experiment was conducted in the Serbo-Croatian lan- 
guage in which native speakers /readers made lexical decisions on in- 
flected nouns and legally inflected pscudonouns following inflected pos- 
sessive pronouns. A possessive pronoun and the noun or pseudonoun 
that followed it could agree in case, gender, and number (0 violations), 
disagree in either case or gender or number (1 violation) or disagree 
simultaneously on two of the three (2 violations). A grammatical con- 
gruency effect was observed for both nouns and pseudonouns. Accept 
tance latencies were shorter and rejection latencies were longer for 
inflectional agreement than inflectional disagreement. However, for 
neither nouns nor pseudonouns was the magnitude of the effect influ- 
enced by the type or number of violations. The results are discussed in 
terms of (1) the automaticity of syntactic processes and (2) the prop- 
erties of a decision making device (specially tailored to rapid lexical 
evaluations) relative to the properties of the language processor. 

INTRODUCTION 

A growing body of evidence supports the notion that syntactical or grammatical relatedness 
colors the way in which one word affects the processing of another. Investigations with English 
language materials address this issue by violating the natural ordering of parts of speech. For 
example, lexical decision to a target is speeded when the context-target pair is ordered legally 
relative to when it is ordered illegally (e.g., men-swear vs. whose-swear [Goodman, McClelland, k 
Gibbs, 1981]; "For now the happy family lives with BATTERIES" vs. "For now the happy family 
lives with FORMULATE" [Wright Garrett, 1984]). In contrast, investigations with Serbo- 
Croatian materials have been able to preserve the ordinary adjacencies of parts of speech because 
grammatical violations can be introduced at the level of inflected morphemes. Gi ammatically 
acceptable pronoun-verb pairs must agre^ in peison and number while adjective-noun pairs must 
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agree in case, number, and gender. Violations of these relationships result in a grammatical 
congruency effect, viz., lexical decision to targets in a grammatically incongruent context are 
slow relative to those same targets in grammatically congruent contexts. As examples, lexical 
decisions to verb targets are faster when the preceding personal pronoun agrees in person than 
when it does not (Lukatela, Moraca, Stojnov, Savic, Katz, & Turvey, 1982); decision times to 
nouns with a case inflection appropriate for a preceding preposition are speeded relative to those 
with an inappropriate inflection (Lukatela, Kostic, Feldman, & Turvey, 1983); slowed decision 
times are found for violations of case agreement between adjectives and nouns or legally inflected 
pseudoadjectives and nouns (Gurjanov, Lukatela, Moskovljevic, Savic, & Turvey, 1985); and 
nouns that agree with their possessive pronoun contexts in gender are lexically evaluated faster 
than those that do not agree (Gurjanov, G. Lukatela, K. Lukatela, Savic, & Turvey, 1986). 

It has been argued that syntactic influences on lexical decision are post-lexical (Gurjanov et 
al., 1985, 1986; Seidenberg, Waters, Sanders, & Langer, 1984: West & Stanovich, 1982). That 
is to say, unlike the spreading activation among particular lexical items that is conjectured for 
associative priming (deGroot, 1983), the grammatical congruency effect is thought to be the result 
of a check on grammatical coherency of the given context-target pair (cf. deGroot, Thomassen, & 
Hudson, 1982; Gurjanov et al., 1986). The reason is quite simple: If the congruency effect were 
th esult of spreading activation, then a prime would have to activate all words of a given type 
(e.g., all nouns of a particular case). It seems unlikely, therefore, that relations among lexical 
entries are responsible for syntactical influences on lexical decision. 

Let us, then, provide a framework for this coherence checker. The central notion is that 
the language processor is composed of three relatively autonomous devices. One accesses lex- 
iced representations of each member of an arrangement of words, another assigns a syntactical 
structure to the arrangement of words, and the third assigns meaning to the arrangement of 
words (cf. Forster, 1979). In the course of normal language comprehension, all three devices are 
necessary. In the experimentally contrived situation of a lexical decision task, although it wouLl 
seem that the lexical processor is all that is required, the other devices cannot be disengaged. 
With a grammatically congruent context-target pair, all devices provide positive output (i.e., each 
performs its usual function) so that the job of the decision-making mechanism is easy. With a 
grammatically incongruent pair, however, tht syntactic processor balks because part of the infor- 
mation made available by the lexical processor is that, for example, the context is masculine and 
the target is feminine. The lexical decision mechanism must overcome the negative bias from the 
syntactical processor (cf. deGroot, 1985; West & Stanovich, 1982), resulting in slower decision 
times. 

It was mentioned earlier that grammatical congruency in the Serbo-Croatian language is 
defined over several dimensions. At issue in the present investigation i? whether or not the 
congruency effect for possessive pronoun-noun pairs is influen ed by: (1) which grammatical 
dimension — gender, number, or case violated, or (2) how many grammatical dimensions are 
violated. In other words, is the negative bias that is induced by the coherence check altered by 
the type or the extent of grammatical violation? 

This question is directed primarily at the natuie of the device that makes a decision about 
a word's lexical status on the basis of the information it receives from the largely independent 
lexical, syntactic, and message processors. These latter processes are presumed to be "hard 
molded, hard algorit limed. M The decision making device, on the other hand, is presumed to be 
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"soft molded, soft algori tinned." It represents the fact that an ordinary speaker/reader of the 
language has temporarily made him or herself into a special purpose mechanism — one geared to 
reporting rapidly on the lexical status of printed letter strings. One could imagine that it is in the 
nature of this soft-moldtd decision making device to weight the outcomes of the lexical, syntactic, 
and message processors. In a lexical decision experiment, for examp the lexical processor ought 
to be weighted most heavily. The value of the message processor would depend on how informative 
it is, given the constraints of the experimental situation. To anticipate our method, the present 
investigation simply uses some form of the possessive pronouns MY or YOUR on every trial. The 
message processor, therefore, is relatively noninformative and ought to be weighted accordingly. 
In contrast, numerous investigations of the effect of minimal grammatical contexts — for example, 
a single, closed class word with an inflection appropriate or inappropriate for the target — reveals 
that considerable weight is given to the syntactic processor in lexical decision. 

Obviously, the more that the outcomes of the three processors concur, the larger the prob- 
ability that the lexical decision device will succeed in making a decision in a determined period 
of time. However, before a soft molded decision device can operate on, say, grammatical incon- 
gruency, it must receive information that incongruency of some type has been detected. This 
information must come from the hard molded syntactic processor. It is reasonable, therefore, to 
expect the soft molded decision maker to be sensitive to the speed of detection of an incongruity. 
One could hypothesize that the speed of detection might depend on the type and/or number of 
grammatical violations (case violation might be considered more egregious than — and be detected 
faster than — gender violation; two violations of any type might be detected faster than any single 
one; and so on). In experimental terms, these hypothesized properties of the decision making 
device would be realized as lexical decision times on nouns in the context of possessive adjectives 
that (1) differ significantly as a function of the type of incongruency and (2) increase as a direct 
function of the number of incongruencies. 

If the outcome of the experiment runs counter to the outlined hypoineses and shows no 
differential effect as a function of type or number of violations, ,hen this lack o! an effect can just 
as plausibly be ascribed to the real structural — i.e., hard molded — processor as to the decision 
maker. A little thought suggests that in order to do its real world job effectively, the hard molded 
syntactic processor might only need to detect the fact that there is or is not a grammatical 
incongruency. Therefore, a self-terminating scan of grammatical features that is associated with 
binaiy coherence checks seems to be a plausible model of the syntactic processor. In experimental 
terms, this latter perspective on the decision making device suggests that the lexical decision times 
for uny type and any number of incongruencies will be the same and that they will be significantly 
slower than zero incongruencies. 

The present experiment addresses these experimental predictions by observing the effects of 
different grammatical relations (1) bet ween possessive pronouns (sometimes referred to as posses- 
sive adjectives) and nouns and (2) between possessive pronouns and psetidonouns. Pseudonouns 
are created from real nouns by substituting for one of the letters in the s'ein. Their inflected end- 
ings, therefore, are legal noun endings. I., consequence, gi annnatical congruency can be defined 
between a possessive pronoun and a pseiulonoun in the same that it can l>r defined between a 
possessive pronoun and a noun. To the extent thai grammatical reL ,oiis are sustained purely by 
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inflectional morphemes, 1 equivalent effects should be observed for acceptance latencies (nouns) 
and rejection latencies (pseudonouns). In order to avoid a confound between grammatical and 
physical congruence of inflectional endings, targets were limited to feminine singular nouns in the 
dative case. Tho inflectional endings of such nouns (-1) and their congruent possessive adjectives 
(MOJOJ and TVOJOJ) are physically dissimilar. 

The aforementioned equivalence between effects obtained with acceptance and rejection la- 
tencies has been noted in two previous grammatical congruency experiments (Lukatela et al., 
1982, 1983). The data from a study that used possessive pronoun-noun pairs (Gurjanov et al., 
1986), as the present experiment does, were ambiguous about the equivalence. 



Seventy-two students from the Department of Psychology in the Faculty of Philosophy at the 
University of Belgrade participated in the experiment in partial fulfillment of a .ourse requirement. 
All subjects had previously participated in reaction time experiments. 



Targets were selected from a basic set of 80 nouns, all of the CCVCV t>pe (e.g., PTICA, 
"bird") drawn from the mid-frequency range (Dj. Kostic, 1965). Corresponding pseudonouns 
were formed using an entirely different set of 80 comparable nouns and changing one letter in 
the stem of each (leaving the inflectional morpheme intact). Of the 160 context-target pairs 
(see Appendix), 100 were test trials and 60 were filler trials included to equate the number of 
congruent and incongruent pairs seen by a given subject. The fillers were not analyzed. 

All targets in the test trials were singular feminine nouns of Class A (after Bidwell, 1970) in 
the dative case (where the ending is /i/). Fifty of these were paired with possessive pronouns (half 
first person [MY] and half second person [YOURjj to generate five types of situations containing 
ten tokens of each type: one set ,/ith no violations, three sets with one violation (where case was 
accusative, gender was masculine, or number was plural) and one set with two violations (where 
gender was masculine and, simultaneously number was plural). Fifty corresponding context- 
pseudonoun pairs were similarly constructed. In addition to precluding physical similarity of 
inflectional endings for contexts and targets, the selection restrictions ensured that only unique 
violation types were produced (test trials included only Class A feminine siugular nouns in the 
dative case, case violations were introduced solely with accusative contexts, and the two-violation 
condition was limited to gender -f number). (For example, Type A feminine nominative singular 
and Type O masculine genitive singular both end in /a/ so that, had such targets been used, the 
extent of the violation would be ambiguous.) 

1 Although it is assumed that pseudowords have no lexical entry, there is evidence that some 
pseudowords derived from real words may access the lexical entry of the source wokL (e.g., 
Martin, 1982; but see Chambers, 1979). Of course, this would affect syntactically congruent and 
incongruent situations to the same extent. 
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For the filler trials, 10 feminine singular accusative, 10 masculine singular dative, and 10 
feminine plural dative nouns were paired with appropriate pronouns, as were a corresponding set 
of pseudonouns. 

Design 

Each subject saw 80 pronoun-noun and 80 pronoun-pseudonoun pairs, half of which were 
grammatically congruent and half of which contained at least one violation. Of the incongruent 
pairs, there were equal numbers of case, gender, number, and gender-plus-number violations. A 
given subject never encountered a given target more than once. 

Procedure 

A subject was seated before the CRT of an Apple He computer in a dimly lit room, A fixation 
point was centered on the screen. On each trial, the subject heard a brief warning signal after 
which a possessive pronoun appeared for 300 ms centered above the fixation point. Aftei a 300 
ms inierstimulus interval a noun or pseudonoun appeared below the fixation point for 1400 ms. 
All letter strings appeared in uppercase Roman. Subjects were instructed to decide as rapidly 
as possible whether or not the second letter string was a word. To ensure that subjects were 
reading the contexts, they were occasionally asked to report both stimuli after the lexical decision 
had been made. Decisions were indicated by depressing a telegraph key with both thumbs for a 
"No" response or by depressing a slightly further key with both forefingers for a "Yes" response. 
Latencies were measured from the onset of the target. If the response latency was longer than 
1400 ms, a message appeared on the screen requesting that the subject respond more quickly. 
The experimental sequence was preceded by a practice sequence of 20 different context-target 
pairs. 

Results and Discussion 

Latencies in excess of 1400 ms and less than 400 ms were excluded from the analysis. The 
means of the subjects' latencies and errors for the three types of violations with noun and 
pseudonoun targets are presented in Table 1. Inspection of Table 1 suggests that for single 
violations, decision latencies were not distinguished by type of violation. For the noun latencies 
and errors the F ratios were less than unity by both the subjects and stimuli analyses. The F 
by the subjects' analysis fo^ the pseudoword latencies exceeded unity but was not significant, 
F(2,i42) - 1.63, MSe = 1288,p > .10. The three other F tests on the pseudoword data (laten- 
cies by stimuli and errors by subjects and stimuli) yielded values less than unity. In short, type 
of violation did not differentially affect word and pseudoword latencies and errors. 

Given this fact, the latency and error data were collapsed over the type variable to yield 
three sets of means corresponding to 0, 1, and 2 grammatical violations and these ar presented 
in Table 2. The effect of number of violations was evaluated on these means. Noun latencies 
were significantly affected by number according to both the subjects and the stimuli analyses, 
F(2,142) - 5.36, MSe = 1402, p < .01 and F(2,118) = 5.95,Jl/5c = 1311, p < .01, respec- 
tively. The same statistical itcomes were obtained for the pseudonoun latencies: F(2, 142) = 
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Table 1 

Lexical Decision as a Function of Type of Grammatical Violation 

Type of Violation 

Target 
Noun 

Pseudonoun 

a latency (ms) 
b error (percent) 

Table 2 

Lexical Decision as a Function of Number of Violations 

Number of Violations 

Target 



Case 


Gender 


Number 


671" 


675 


671 


4.4 6 


6.0 


6.0 


718 


708 


717 


2.6 


3.5 


2.1 



Noun 



Pseudonoun 

a latency (ms) 
b error (percent) 



0 


1 


2 


656° 


672 


675 


3.2 6 


5.5 


6.1 


730 


714 


715 


3.3 


2.7 


3.6 



4.65, MSc = 1147, p < .01 by the subjects analysis and F(2, 118) = 4.86,/) < .01 by the stim- 
uli analysis. Errors in noun decision making were significantly affected by nuinbei of violations 
according to both the subjects and stimuli analyses- F(2. 142) - 4 97*MS( = 34, p .01 and 
F(2, 118) = 7.37, A/s/ - 31, p '„ .001. In contrast, number did .lot affect pseudonoun nrors The 
A NOVA on subjects and stimuli means both yielded F ratios less than unity. 

Protected 1- tests (where the error tn:n from the A NOVA is used as the estimate of the 
variance; see Cohen k Cohen, 1 975 ) were conducted on the means for the 1 versus 2 violations. 
No significant differences were obtained. 
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The results of the experiment are fairly straightforward. First, there was a grammatical 
congruency effect, and it was observed for both nouns and pseudonouns. Second, the magnitude 
of the effect for both nouns and pseudonouns was indifferent to the type and the number of 
grammatical violations. 

Let us consider the first result. Possessive pronoun-noun pairings that were in full grammati- 
cal agreement were associated with faster lexical decisions than possessive pronoun-noun pairings 
that disagreed on one or two grammatical dimensions. Similarly, possessive pronoun-pseudonoun 
pairings that were in full grammatical agreement (the pseudonoun's inflection agreed in case, gen- 
der, and number with the possessive pronoun's inflection) were associated with slower rejection 
latencies than pairings in which the agreement was incomplete by one or two dimensions. The 
magnitude of the grammatical congruency effect in the noun latency data was 16 ms for zero 
versus one violation and 19 ms for zero ver c us two violations. Gurjanov et al. (1986) obtained 
a congruency effect for zero versus one violation of the order of 51 ins (calculated from the data 
on feminine nouns preceded by possessive pronouns reported in their Table 2). In the course of 
the latter experiment, only one type of disagreement ever occurred, namely, in gender. It con- 
trasts, therefore, with the present experiment in which all three types of possible disagreement 
occurred and in which the number of disagreements was frequently two. The large difference in 
the magnitudes of the congruency effect defined over possessive pronoun-noun pairs in the two 
experiments is probabl} attributable to these differences in the homogeneity of grammatical ma- 
nipulations. The situation may be analogous to that in associative priming experiments. Tweedy, 
Lapinski, and Schvaneveldt (1977) showed that the facilitation due to an associative context was 
greater with a larger proportion of associative trials. They interpreted this result within Posner 
and Snyder's (1975) two-factor theory of attention. Focusing on the conscious attentional com- 
ponent, Tweedy et al. (1977) argued that the subjects' expectation concerning the relatedness 
of the items allows for a specialized post-lexical control strategy (cf. Shiffrin & Schneider, 1977) 
to be brought into effect. In principle, the decision making device in the Gurjanov et al. (1086) 
experiments could concentrate on just the gender dimension. The concentration in the present 
experiment could not have been as focused because the subjects' expectancies were that any one 
of the dimensions of grammatical agreement could be violated with near equal probability. 

The magnitude of the grammatical congruency effect on word (noun) latencies in the present 
experiment, compares favorably with the magnitudes of syntactical congruency effects reported 
for English language two-word sequences by Goodman et al. (1981) and Seidenberg et al. (1984)., 
In the two experiments of Goodman et al. the magnitudes were 19 ms and 15 ms. In the single 
experiment of Seidenberg et al. the magnitude was 13 ms. A fu: * n er favorable comparison is 
to be found between the respective error productions. In the present experiment the percent 
error for the congruent condition was 3.19. For the single and double incongrueiicy cor dit ions the 
percent errors were 5.42 and 6.11, respectively, to yield congruent -incongruent differences of 2.24 
percent and -2.92 percent. Significant differences in eiror production between congruent and 
incongruent conditions on the order of - 4.0 percent and 1.3 pel cent were reported respec tively, 
for the first of Goodman et al.'s experiments and for the Seidenberg et al. experiment. In (he 
Gurjanov et al. (1986) study, the congruence -incongruence erior production difference (avei r ip<<| 
ovei masculine and feminine nouns of typical and atypical declension) amounted to 2.7 pel cent. 

The grammatical congruency effect in the pseudonoun latency data was -16 ms foi the 0 
versus 1 comparison and - 15 ms for the 0 versus 2 comparison. These rejection latency differences 
complement the acceptance latent) clifleiences and the) concui in this respect with the lesults of 
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several previous experiments that used pseudoverbs and pseudoadjectives as well as pseudonouns. 
We will summarize these findings briefly before elaborating the significance of grammatical effects 
with pseudowords. 

The preposition-noun experiment of Lukatela et al. (1982) included pseudonouns that were 
mostly but not exclusively generated by the substitution of the first letter of a noun keeping the 
inflected ending legal. An interaction between preposition and pseudonoun type (nominative- 
like, dative/locative-like, instrumental-like) was obtained with subject variability as the error 
term but not with stimulus variability as the error term. The data suggested that where the 
inflection of a pseudonoun agreed with the preceding preposition, the rejection latencies were 
slowed (by approximately 18 ms) relative to when they were in disagreement. For the noun 
targets grammatical agreement with the preceding prepo .tio.i hastened (by approximately 28 
ms) positive decisions relative to grammatical disagreement. In the pronoun-verb experiments 
of Lukatela et al. (1983) all pseudoverb stimuli were inflected with verb endings. They were 
created by single letter substitution in the stems of the verbs. These experiments also provided 
evidence for complementary effects between the positive and negative latencies. Taking the first 
experiment of Lukatela et al. (1982) as an example, grammatical congruency resulted in faster 
(by 128 ms) positive decisions and slower (by 2Y ms) negative decisions. Finally, the experiments 
of Gurjanov et al. (1985) that examined adjective-noun pairings should be mentioned. These 
experiments found no evidence of a grammatical congruency effect with pseudonoun targets. They 
did demonstrate, however, a grammatical congruency effect with pseudoadjective-noun pairs (that 
is, on positive decision latencies) that was as large as the effect for adjective-noun pairs. 

The significance of demonstrating grammatical congruency effects with legally inflected pseu- 
dowords as either contexts or targets is that it points to the main carriers of grammatical infor- 
mation, the inflectional morphemes, as largely responsible for the effect. In more theoretical 
terms, it lends support to the hypothesis that the syntactic level of processing operates relatively 
independently from the semantic-interpretative processes (Forster's message processor). When 
pseudowords are used as either contexts or targets, the "word r sequence is meaningless. Conse- 
quently, one cannot appeal to a process of sentence comprehension to effect, in top-down fashion, 
the syntactic analysis. Further, when pseudowords are used as either contexts or targets, the 
lexical processor must deliver definitional information, to use Fodor's (1983) term, pertaining to 
the grammatical function of the pseudoword's inflection. The implication is that lexical processes 
work with a morphemic inventory and can effectively distinguish morphemic constituents in the 
absence of activating full (that is, word) lexical entries. That the grammatical congruency effect 
is demonstrable with pseudoworos means that the lexical processes provide acceptable inputs to 
the syntactic processes. We must, nevertheless, be careful of carrying this line of argument too 
tar. The grammatical congr.iency effect is less reliable for pseudowords than words. And this 
difference is probably telling us (not surprisingly) that the stem as well as the suffix is a source 
of grammatical information. The lexical processor working with words rather than the con- 
stituents of words can more reliably furnish definitional information about the parts of speech. 
Serbo-Croatian nouns share many of their inflections with other word types (most notably with 
adjectives but also with the cardinal numerals). To the extent that stem information is not ac- 
cessed, the identity of a letter string as * noun is less clear and the lexical processor is less able 
to provide acceptable resources for the syntactic operations. 

Another reason that the grammatical congruency effect is more difficult to reveal with pseu- 
doword targets is that the process of isolating affixal information in pseudowords may be slower 
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than in words. In consequence, the lexical search determining the absence of a pseudoword's en- 
try may often be completed before affixal information about the pseudoword has been discerned 
(Wright & Garrett, 1984). Under these conditions no contribution of the syntactic processor 
would be expected. 

The second result of the present experiment was that the magnitude of the grammatical con- 
gruency effect, for both nouns and pseudonouns, was indifferent to the type of violation and to 
the number of violations (one or two). In terms of the arguments raised in the introduction, this 
result suggests that the information of relevance to the decision making process is merely that the 
two words do not agree grammatically. Type of disagreement and the number of disagreements 
do not affect the magnitude of the negative bias (that hinders positive decisions and aids negative 
decisions). Each type of grammatical disagreement (case, gender, and number) contrasted with 
complete agreement. This fact of a grammatical congruency effect defined with respect to each 
violation suggests that, in the experiment, syntactic processors were eva 1 ting all three gram- 
matical relations between the possessive pronoun context and the noun or pseudonoun target. 
From the perspective of the j->b that these processors ordinarily perform in everyday sentence 
comprehension, namely, assigning grammatical structure: to word sequences, it may well be that 
the assignment relies differentially on case, gender, and number information. This po sibility 
cannot be ruled out by the fact that in the present experiment each type of grammatical dis- 
agreement contrasted with full agreement to the same degree and by the related fact that two 
disagreements were no worse than oue. 

Inferences from lexical decision data to underlying linguistic mechanisms have to contend 
with the soft algorithmical capabilities assembled specifically lor the task. As suggested in the 
introduction, it is useful to construe a subject in a lexical decision task as assembling him or herself 
into a device specially tailored to the goal of passing rapid judgment on the lexical status of a 
letter string. The subject, of course, is a language processor — a con plex device that ordinarily 
analyzes multiple embeddings of linguistic structures of different grains, and does so on line. 
Fashioning a device tailored to lexical decision can be regarded as the fashioning of an alternative 
description of the language processor (see Pattee, 1972, for a general argument of this kind with 
regard to biological functions). This alternative (simpler) description makes explicit some of the 
detailed processing that is implicit in ordinary sentence comprehension. The important point 
to be underscored is that the special purpose lexical deci&ion device as an alternative (simpler) 
description of the language processor is selective. T t does not make explicit all of the processing 
detail. Thus, it suffices for lexical decision to make explicit giammatical conformity. The nature 
and time course of the processing details that determine grammatical conformity remain, however, 
largely implicit. 
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APPENDIX 

For the experimental situations, all word (•) and pseudoword targets are feminine singular nouns 
in the dative case. Possessive adjective contexts are either grammatically congruent or violate 
case, number, gender, or number + gender. 
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For the filler situations, nouns and pseudonou; 
line singular dative, or feminine phiral dative. 
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were either feminine singular accusative, mascu 
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LOW CONSTRAINT FACILITATION IN LEXICAL DECISION WITH 
SINGLE WORD CONTEXTS* 



G. Lukatela,** Claudia Carello,f A. Kostic,** and M. T. Turveyt 

Abstract. Single word, low constraint adjective contexts were used 
to "prime" lexical decision to noun targets in Serbo-Croat. Semanti- 
cally congruent situations consisted of adjective-noun pairs that were 
not highly predictable but were nonetheless plausible (e.g., GOOD- 
AUNT). Semantically congruent situations used pairs that were nn- 
plausible (e.g., SLOW-COAT). All adjective-noun pairs were gram- 
matically congruent and were compared to a neutral xxx baseline.. In 
Experiment 1, at a stimulus onset asynchrony of 300 ms, congruous 
situations showed 59 ms of facilitation while incongruous situations 
did not differ from the baseline. The same pattern was repeated in 
Experiment 2, at a stimulus onset asynchrony of 800 ms. Congru- 
ous situations were facilitated 67 ms. Results were discussed in terms 
of a message-level coherence check in Forster's (1979) model of au- 
tonomous levels of language processing. 

Introduction 

The existence of facilitating sentence context effects has been considered to be of much 
theoretical significance. Recent interest has centered on the difference between low constraint or 
unfocused contexts — those for which many completions aie appropriate but no one is particularly 
predictable — and high constraint or focused contexts — those for which a particular completion 
is highly predictable. The iesue concerns whether or not low constraint context effects occur 
and, if they do, whether they ten ci should be interpreted as arising from generalized priming. 
A generalized priming interpretation means that a very large set of lexical items is primed, or 
the features generated are few and general, or subjects' retention is focused on a wide range of 
completions. Such explanations suggest that higher level Knowledge and expectations can relate 
interactively with lower level processes such as word recognition (e.g., Sanocki, GoHman, Waltz, 
Cook, Epstein & Oden, 1985; Schwanenflugel & Shoben, 1985). 

* American Journal of Psychology, in press 
** University of Belgrade, Serbia, Yugoslavia 

f Trinity College 

| Also University of Connecticut 

Acknowledgment , This research was supported in pa»t by National Institute of Child Health and 
Human Development Grants HD-08495 and HD-01994 to the University of Belgrade and Hawkins 
Laboratories, respectively. 

Haskins Laboratories Stains Report on Speech Research SR-89/90 (1987) 



93 



98 



94 



Lukatela et al. 



Such an account contrasts with approaches that maintain the autonomy of different levels of 
processing (e.g., Forster, 1979, 1981; West & Stanovich, 1982). The levels are separate and hier- 
archically arranged: The lexical processor receives input only from feature analysis; the syntactic 
processor receives input only from the lexical processor; the message processor vece ; es input only 
from the lexical processor and the syntactic processor. Clearly, sentence context effects cannot 
influence lexical processing. 

...[E]ffects due to lexical context (i.e., single word contexts) are entirely acceptable 
within this theory, since they can be described as within level effects rather than between 
level effects. That is, the lexical context effect is assumed to be mediated by structural 
properties entirely internal to the lexical processor itself, and no other level of processing 
need be involved (Forster, 1979). Viewed from this perspective, then, the possibility that 
lexical and sentence context effects might have different properties takes on considerable 
significance (Forster, 1981, p. 471). 

The data from semantic sentence context effects reveal that appropriate semantic comple- 
tions are fast relative to inappropriate completions. But when compared to a neutral baseline, 
results are mixed. For high constraint sentences, appropriate completions are usually facilitated 
and inappropriate completions are inhibited (Forster, 1981; Schwanenflugel & Shoben, 1985; al- 
though see Fishier & Bloom, 1979, for predictable completions that did not differ significantly 
from the baseline). For low constraint sentences, inappropriate completions are inhibited but ap- 
propriate completions either show no difference relative to a neutral baseline (Fischler & Bloom, 
1979; Forster, 1981) or show significant facilitation that is less than that found for predictable 
completions (Schwanenflugel & Shoben, 1985), 

The Serbo-Croatian language hats provided a convenient medium for exploring low constraint 
contexts. Although the investigations have used syntactic rather than semantic contexts, they are 
nonetheless instructive for present purposes. As an inflected language, Serbo-Croat permits the 
creation of highly salient grammatical contexts with a single word (note, in contrast with Forster, 
1981, that single words need not be simply lexical contexts). Furthermore, it does not require 
that word class be violated in order to obtain grammatical incongruency as is typically done with 
English language materials (e.g., Wright & Garrett, 1984). For example, adjectives and nouns 
must agree in gender (masculine, feminine, or neuter), case (e.g., nominative, dative, accusative, 
etc.), and number (singular or plural). When a context and target agree on these dimensions, 
lexical decision is faster than when one or more of the dimensions is incongruent (Gurjanov, 
Lukatela, Moskovljevic, Savic, & Turvey, 1985). Similar effects have been found for pronoun- 
verb pair" with respect to person (Lukatela, Moraca, Stojnov, Savic, Katz, & Turvey, 1982), 
preposition-noun pairs with respect to case (Lukatela, Kostic, teldman, & Turvey, 1983), and 
possessive adjective-noun pairs with respect to gender (Gurjanov, Lukatela, Lukatela, Savic, k 
Turvey, 1985). To date, these grammatical congruency effects have been defined over the difference 
between congruent and incongruent situations and have not employed a neutral baseline. Relative 
amounts of facilitation and inhibition are not known. 

These low constraint syntactic context effects are germane to the current discussion because 
they have been interpreted within a framework that is continuous with Forster's model of au- 
tonomous levels. The outputs of each level are considered to be available to the decision making 
device. In the normal course of language comprehension, all of these outputs are important and 
the processor heeds all of them. When the processor becomes specialized for lexical decision, 
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it cannot obviate this characteristic. That is to say, even though lexical decision needs output 
from the lexical processor alone, the other subproc^ssors cannot be disengaged; their outputs — 
in the form of syntactic and pragmatic coherence checks — bias the decision making device. A 
positive bias, as when the context and target arc grammatically congruent or pragmatically plau- 
sible, hastens lexical decisions relative to a negative bias, as when the context and target are an 
ungrammatical or implausible combination. 

It is important to note that, in contrast to associative priming, these context effects are 
post-lexical. They do not change the speed with which a lexical entry is found. And they allow a 
form of automatic processing (deGroot, Thomassen, & Hudson, 1982) that is different from the 
spreading activation assumed to operate in the lexicon. If information needed for the coherence 
evaluation is provided in the lexical entries for context and target, then low constraint contexts 
(e.g., minimal grammatical contexts, unfocused sentence contexts) can have a facilitating (or, 
unlike spreading activati rn, an inhibiting) influence on lexical decision times without entailing 
the unlikely assumption that broad classes of items in the lexicon — for example, all feminine 
singular nouns — are activated or attended to. 

One word contexts are useful because they allow tight control on the context-target associa- 
tive relationship (e.g., it cannot accumulate insidiously from several words in the context) and on 
the stimulus onset asynchrony (SOA). This last benefit is of importance because in contrast to 
spreading activation, which decays over time, post-lexical coherence checks should be indifferent 
to >he interval between context and target. Their output is simply "coherent" or "not coherent" 
and this will not change over time (although, presumably, there is an upper limit after which the 
context and target will no longer be considered as part of the same situation). Whatever pattern 
of facilitation and inhibition is found at a short SOA, therefore, should be repeated at a long 



The situ~. A ions to be explored in the present experiments are low constraint, single word 
semantic contexts. Grammatically congruent, semantically plausible adjective-noun pairs and 
grammatically congruent, semantically implausible adjective-noun pairs will be evaluated relative 
to xxx-noun baselines. 1 A positive bias from both the syntactic and message processors should 
produce facilitation relative to the neutral baseline. But a positive bias from the syntactic pro- 
cessor coupled to a negative bias from the message processor should effectively cancel each other, 
making that condition no different from a neutral context. Experiment 1 will investigate this 
contrast at an SOA of 300 ms and Experiment 2 will use an SOA of 800 ms. 

1 DeGroot et al. (1982) warn that the xxx baseline may, in fact, be inhibitory and that a more 
neutral context is provided by a word such as "blank." Because the Serbo-Croatian language 
is inflectional, however, all words are marked for a grammatical role. Consequently, almost any 
seemingly neutral word would necessarily facilitate those words with which it was grammatically 
congruent and inhibit those with which it was incongruent. An exception is provided by noun 
contexts for noun targets — such pairs do not create a syntactic situation (Lukatela & Popadic, 
1979) — but these introduce the possibility of associative or semantic relatedness. While the 
concerns of deGroot et al. (1982) are important, it may be that in Serbo-Croat, xxx contexts are 
as neutral as it gets. It has been suggested that a high proportion of baseline trials may serve 
to limit the inhibitory influence of xxx contexts (deGroot et al., 1982). Both of our experiments 
follow this recommendation by including 50% baseline trials. 
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Experiment 1 

Method 

Subjects. Twenty-src students from the Department of Psychology in the Faculty of Phi- 
losophy at the University of Belgrade participated in the experiment in partial fulfillment of a 
course requirement. All subjects had previously participated in reaction time experiments. 

Materials. Critical context-target pairs consisted of 26 congruous adjective-noun pairs (e.g., 
BELI GOLUB, "white pigeon") and 26 incongruous adjective-noun pairs (e.g., VUNENA SKOLA, 
"woolen school") drawn from the mid-frequency range (Dj. Kostic, 1965). Targets ranged from 
4-7 letters in length. (Because associative norms do not exist for Serbo-Croat, possible associative 
relationships were eliminated on the basis of a pretest.) All pairs were in the nominative case. 
Half of the pairs (in both conditions) were feminine and half were masculine. In addition, 52 
adjective-pseudonoun pairs were constructed in which the pseudonouns differed from real words 
by replacing one or two letters but preserving the inflectional ending so that the pairs would not 
be grammatically incongruent. 2 The adjectives were the same as those that had been paired with 
the nouns. Finally, 104 baseline pairs were constructed by appending a context of 3 crosses (xxx) 
to all of the nouns and pseudonouns. 

Design. Each subject saw 26 adjective-noun pairs (half congruent and half incongruent), 
26 adjective-pseudonoun pairs, 26 xxx-noun pairs, and 26 xxx-pseudonoun pairs, Subjects were 
randomly assigned to one of two counterbalancing groups as illustrated in Table 1. A given 
subject never encountered a given target or context (other than xxx) more than once. 



Table 1 

Illustration of the Design and (Translated) Examples 
of Stimuli Used in the Experiments 



Context-target relation 



Noun 










Gender 


Congruous 


Incongruous 


Neutral 


Pseudoword 


F 


THIN-HAIR 


SLEEPY-DOOR XXX-AUNT 


GOOD-GREEB 


M 


DEEP-POT 


SLOW-COAT 


XXX-DEER 


SPEEDY-CLUD 


F 


GOOD-AUNT 


SOUR-CAT 


XXX-HAIR 


THIN-SPORL 


M 


SPEEDY-DEER HAPPY-NAIL 


XXX-POT 


DEEP-LORT 



Procedure. A subject was seated before the CRT of an Apple He computer in a dimly 
lit room. A fixation point was centered on the screen. On each trial, the subject heard a 

2 For pseudonouns following nominative adjectives, the pairs cannot be decisively congruent, 
though, because of the way in which case is marked in nouns. An inflection such as -A indicates 
nominative for feminine singular nouns but genitive for singular masculine nouns. That is, access 
of the lexicon is required in order to render the inflection unambiguous. 
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brief earning signal after which an adjective or xxx appeared for 300 ms centered above the 
fixation point. Immediately after the context disappeared (SOA of 300 ms) a noun or pseudonoun 
appeared below the fixation point for 1400 ms. All letter strings appeared in uppercase Roman. 
Subjects were instructed to decide as rapidly as possible whether or not the second stimulus was 
a word. To ensure that subjects were reading the contexts, they were occasionally asked to report 
both stimuli after the lexical decision had been made. Decisions were indicated by depressing 
a telegraph key with ooth thumbs for a "No" response or by depressing a slightly further key 
with both forefingers for a "Yes" response. Latencies were measured from the onset of the target. 
If the response latency was longer than 1500 ms, a message appeared on the screen requesting 
that the subject respond more quickly. The experimental sequence was preceded by a practice 
sequence of 20 different context- target pairs. 

Results 

Latencies in excess of 1500 ms and less than 350 ms were excluded from the analysis. The 
means of the subjects' latencies are shown in Figure 1 and their percentage errors (wrong and 
slow responses) are presented in Table 2 (None of the error analyses revealed any significant 
differences). A prime x congruence ANOVA on the acceptance latencies revealed a main effect 
of prime, F(l,25) = 8.04, MSerr = 1909.44, p < .01 (word primes averaged 674.5 ms while xxx 
primes averaged 699 ms) and congruence, F(l,25) = 5.54, MSerr = 2452.07, p < .03 (congruent 
situations averaged 675.5 ms, while incongruent xxx primes averaged 698 ms). The prime x 
congruence interaction was significant, F(l,25) = 28.85, MSerr = 1083.95, p < .001. Protected 
t-tests (Cohen & Cohen, 1975; the erro/ term from the ANOVA is used as the estimate of the 
variance) were conducted on the means for congruous versus baseline, t(25) = 4.87, p < .01, and 
incongruous versus baseline, *(25) = .82, p > .10. In other words, there was facilitation but no 
inhibition. 

800 
- 700 

0) 

g 

* 600 



500 



CONGRUOUS INCONGRUOUS 

CONTEXT-TARGET RELATION 

Figure 1. Average lexical decision latencies to word and pseudoword targets as a function of the semantic 
relationship between context and target at an SOA of 300 ms (Experiment 1). 
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Tabie 2 



Percentage of Incorrect Lexical Decisions for Semantically 
Congruous and Incongruous Pairs with an SOA of 300 ms 



Words 



Pseudowords 



Context-target relation a 
Congruous 
Incongruous 



Prime 



2.37 



1.18 



XXX 



2.68 



2.07 



Prime 



2.37 



1.48 



0.59 



1.18 



XX 



0 Labels are defined for words and applied to pseudowords with corresponding contexts. 



The pattern of results was largely corroborated by the stimulus analysis of acceptance la- 
tencies. The effect of prime was again significant, F(l,50) = 6.24, MSerr = 2588. 71, p < .02, 
but the effect of congruence was not, F(l,50) = 3.51, MSerr = 3804.01, p < .07. The inter- 
action, F(l,50) = 11.78, MSerr = 2588.71, p < .001, revealed the same pattern of facilitation 
as was found in the subjects analysis: protected t-tests indicated that there was facilitation 
for congruous situations, /(50) = 5.93,/) < .01, but not inhibition for incongruous situations, 
/(50) = .93,j> > .10. 

For the rejection latencies, there was no effect of congruence, F < 1, The effect of prime 
was significant, F(l,25) = 10.91, MSerr = 891. 73, p < .01 (word primes averaged 744.5 ms; 
xxx primes averaged 725.0 ms). Their interaction was significant, F(l,25) = 1 1.17, MSerr = 
1175. 95, p < .01. Protected t-tests revealed inhibition of the "congruent" pseudowords, *(25) = 
5.07, p < .01, but no effect on "incongruent" pseudowords, 2(25) = .30, j> > .10. 

This was duplicated in the stimulu analysis of rejection latencies. Prime was significant, 
i^l.ftO) = 5.05, MSerr = 2003.52, p < .03, but congruence was not, F < 1. The interaction 
was again significant, F(l,50) = 6.52, MSerr = 2003.52,p < .02. Protected t-tests revealed 
inhibition in the congruous situations, i(50) = 4.8,// < .01, but no difference for incongruous 

situations, f(50) = .30, p > .10, 



Subjects. Twenty-six students from the Department of Psychology in the Faculty of Phi- 
losophy at the University of Belgrade participated in the experiment in partial fulfillment of a 
course requirement. All had experience in reaction time experiments but none had participated 
in Experiment, 1. 

Materials and design. The same as Experiment 1. 

Procedure. The same as Experiment 1 with the exception that the SOA was 800 ms. 
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Results 

Latencies in excess of 1500 ms and less than 350 ms were excluded from the analysis. The 
means of subjects' latencies are shown in Figure 2 and their percentage errors (wrong and slow 
responses) are presented in Table 3. A prime x congruence ANOVA on the acceptance latencies 
revealed significant differences of prime, F(l,25) = 33.78, MSerr = 1082.71, p < .001 (with 
word primes averaging 626.5 ms and xxx primes averaging 664 ms), congiuence, F(l,25) = 
4.93, MSerr = 2482.99, p < .04 (with congruent situations averaging 634.5 ms and incongruent 
situations averaging 656 ms), and a prime x congruence interaction, F(l,25) = 17.92, MSerr = 
1241. 14, p < .001. Protected t-tests revealed significant facilitation for congruous nouns, t(2b) = 
7.34, p < .01, but not inhibition for incongruous nouns, *(25) = .87, p > .10. The error analysis 
revealed an effect of prime, F(l,25) = 10.21, MSerr = 10.92, p < .004. 
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Figure 2. Average lexical decision latencies to word and pseudoword targets as a function of the semantic 
relationship between context and target at an SOA of 800 ms (Experinien* 2) 



For the rejection latencies, there was an effect of prin e, F(l,25) = 12.55, MSerr — 1572.83, p 
< .002 (word primes averaged 728.5 ms, xxx primes averaged 701 ins J but neither congruence, 
F(l,25) = 3.34, MSerr = 1286.6, p < .08, nor the interaction, F < 1, reached significance. No 
differences were found by the error analysis. 

In the stimulus analysis of acceptance latencies the effect of congruence was not significant, 
F(l,50) = 2.49, MSerr = 4351.19, p > .10. The main effect of prime, F(l,50) = 12.99, MSerr = 
2925.91, p < .001, and the interaction, F(l,50) = 8.29, MSerr = 2525.91, p < .01, were signifi- 
cant. The error analysis showed an effect of prime, F(l,50) = 7.38, MSerr - 15.11, p < .01. For 
rejection latencies, prime was significant, F(l,50) — 8.19, MSerr = 2589.09, p < .01. Neither the 
effect of congruence nor the interaction reached significance, F < ]. No significant differences 
were found in the error analysis. 
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Table 3 

Percentage of Incorrect Lexical Decisions for Semantically 
Congruous and Incongruous Pairs with an SOA of 300 ins 

Words Pseudowords 

Context-target relation 0 Prime XXX Prime XX 

Congruous 0.89 2.07 2.66 0.89 

Incongruous 1.18 4.14 2.07 1.48 

0 Labels are defined for words and applied to pseudowords with corresponding contexts. 



Discussion 



As expected, plausible low constraint semantic contexts produced a facilitatory effect on word 
recognition while implausible low constraint semantic contexts yielded lexical decision times that 
were not different from a neutrra base^ne. The lacl: of inhibitio i for incongruous situations would 
not be predicted by a generalized priming story (e.g., Schwanenflugel & Shoben, 1985). This is 
particularly true at the longer SOA (cf. Neely, 1977) where the effect of attentional processes 
ought to be greater. Indeed, Becker (1980) has demonstrated inhibition dominance when the set 
of expected targets is not narrow. This latter result was obtained with associates (where the 
context was a category and target could be a typical or nontypical member of that category), 
howtver, and would not have tapped the semantic plausibility of a particular pair. We conjecture 
that the lack of inhibition in the seinantically implausible situations in the present experiments 
derived from the fact that, because all situations were grammatically congruent, a positive bias 
from the syntactic coherence check cancelled the negative bias from the message level coherence 
check. The resulting situation was equivalent to having no context. 3 

Superficially, : t might seem that the pseudoword data ♦vhich generally showed inhibition 
relative to the baseline, contradict this interpretation: Why isn't negative bias from the messagf 
processor cancelled by positive bias from the syntactic processor? We suspect that, because 
of the way in which case is marked in nouns, the syntactic processor is put into a "holding 
pattern," giving neither negative nor posi'ive bias. Negative bias is absent because the syntactic 
relationship of the adjective -pseudonoun pairs is not immediately suspect. A negative bias would 
occur if the pseudoiioun's inflection unambiguously indicated that its case was inappropriate foi 
the preceding adjective (e.g., BELI BRAKU is unequivocally incon^ruent because the nominative 
adjective is followed by a pseudonoun marked for the accusative case). But such situations were 
not used here. Nonetheless a positive bias cannot be given either, because the inflections with 
which pseudonouns were constructed were ambiguous. For example, whether -A indicates that a 

3 Because association norms have not been compiled for the Serbo-Croatian language, one 
might argue that the experimental materials were, in fact, weak associates and nonassociates 
rather than low constraint plausible and implausible contexts. If this were the case, however, 
then we should expect no tffect on the former and inhibition on the latter (cf. deGroot et al., 
1982). 
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singular noun is genitive (and, therefore, incongruent) or nominative (and, therefore, congruent ) 
depends on the noun's gender ( c *e Footnote 2). The problem arises because, for nouns, gender 
information is obtained from the ^xicon, not the surface moiphology of the letter string. There is, 
of course, no lexical entry for pseudonouns. This means that the syntactic processor continues to 
run, waiting for the information it need? to evaluate these syntactic situations. It would most likely 
be stopped only when a general system-level decision deadline is reached (cf. Coltheart, Davelaar, 
Johansson, & Besner, 1977). In contrast, xxx contexts should not engage the syntactic processor; 
the situations they create should be recognized as situations not requiring syntactic evaluation. 
When xxx-pseudoword (or xxx-word) is encountered, therefore, the syntactic processor makes 
no attempt to assign a syntactic structure to it. Decision time in nonsyntactic contexts can 
be influenced simply by the lexical processor, yielding faster responses than when the syntactic 
processor is caught up "waiting for Godot." 

The question of whether or not the syntactic processor is engaged during the experimental 
situation also speaks to the difference between the present pseudoword data, where xxx contexts 
have a facilitating effect relative to word contexts, and other research, where the neutral context 
ha; an inhibiting effect (e.g., Balota, 1983; deGroot et al., 1982; Neely, 1976, 1977). As noted, 
the adjective-noun and adjective-pseudonoun situations used in the present investigation involved 
syntactic as well as semantic relations. More commonly, noun-noun associative pairs are employed 
and these appear not to be treated as syntactic situations (e.g., two semantically unrelated nouns 
that are in the same case do not show facilitation relative to those same nouns in incongruent cases 
[Lukatela ^ Popadic 1979]). The difference between unrelated word contexts and xxx contexts 
is, as deGroot et al. (1982) have argued, attributable to the inhibiting influence ~f xxx. In the 
present experiments, that inhibiting influence was either nullified by the high proportion of xxx 
trials (see Footnote 1) or counteracted by the futile attempts at a syntactic evaluation. 

Further support for an interpretation in the framework of autonomous coherence checks 
comes from the duplication of the facilitation pattern at the short and long bOAs. The amount 
of facilitation was similar— 59 ms at SOA 300 ms and 67 ms at SOA 800 ms— and the amount 
of inhibition was small and not significant at either interval. In contrast to a priming account, 
it can be argued that congruence effects defined at the syntactic or message levels ought to be 
rate-independent. Because the processing takes the form of a coherence evaluation with simply 
a positive or negative result, there is no avenue for time (other things being equal) to influence 
the outcome of the evaluation. The overall hastening of lexical decision from 300 ms SOA to 
800 ins SOA (by 42 ms for words and 20 ms for pseudowords) is likely to be a general result 
of preparatory processes common to reaction time tasks (Gottsdanker, 1980) rather than an 
indication of a change in language processing at the two intervals. 

It would be useful to investigate the time course of low loiistiaint facilitation in a naming 
task as comparisons of lexical decision and naming are often informative (cf. West k Stanovich, 
1982). In studies of associative priming, for example, deGrooi (1984, 1985) ha* found flhat 
facilitation of lexical decision does not increase significantly over SO As but facilitation of naming 
does. She suggests that "meaning integration 11 (the message processoi ) overshadows the eHect of 
context -induced attentional processing in lexical decision but in naming, winch does not engage 
the message level, the effect of attention can be seen to increase ovei SOAs. Failures to date 
to find semantic priming of naming in Serbo-Croat (Katz & Feldnian, 1983), however, prohibit 
such a comparison here. Lupker (1984) has pointed out that so-called semantic priming actually 
hinges on the associative relationship between the context and target. If this is controlled Un 
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completely, then purely semantic relationships would produce uo facilitation. Comparing strong 
a*id weak associates in a naming task would not address the issue of facilitation by low constraint 
semantic contexts. 

Nonetheless, the present results are consistent with a number of experiments that exploit the 
inflectional nature of Serbo-Croat in investigations of syntactical processing. Neither spreading 
activation nor a prelexical attentional type of priming is supported by a pattern of findings that 
militate strongly for post-access coherence checks. We will summarize the argument here but see 
Gurjanov et al. (1985b) for the complete line of reasoning. As already mentioned, the standard 
result is that the target in a grammatically congruent pair is evaluated more quickly than the 
target in a grammatically incongruent pair (e.g., Gurjanov et al., 1985a, 1985b; Lukatela et al., 
1982, 1983). Of particular interest is the fact that the magnitude of the grammatical congruency 
effect for adjective-noun pairs is matched by that found for pseudoadjective-noun pairs, both in 
visual (Gurjanov et al., 1985b) and auditory lexical decision (Katz, Boyce, Goldstein, & Lukatela, 
1987). The observed influence of a pseudoadjective on the processing of a noun could only have 
been achieved through a relating of their respective inflections. The information required in 
order that a syntactic device might evaluate such relations is of three kinds: (1) inflections must 
be distinguished from stems; (2) word class must be identified; and (3) word gender must be 
identified. These three kinds of information are made available by lexical access. 

What is the theoretical significance of low-constraint facilitation of word recognition? As 
the argument is usually developed, such effects are supposed to infirm models of autonomous 
processing because such effects imply that high level information is interacting with low level 
processes. In their summary of the issue, Sanocki et al. (1985, p. 147) observe: 

A facilitatory effect of low-constraint contextual information would be of particular 
theoretical interest, because it would implicate a linguistically powerful mechanism... 
A facilitatory effect of such a context would implicate a high-level mechanism that could 
affect more words than word level mechanisms (e.g., Becker, 1980; Neely, 1977) could 
affect. 

Forster, architect of perhaps the strongest autonomous model, also sees low constraint sen- 
tences in the same light: "This theory clearly requires that sentence contexts should not influence 
lexical processing (either positively or negatively)" (1981, p. 471). agree that a model of 
autonomous processing cannot accommodate such effects on lexical processing , but we do not 
agree that the existence of low constraint context effects necessarily implies the existence of lt a 
linguistically powerful mechanism" that i.>, indeed, influencing lexical processing. Rather, the 
message processor does its evaluation on the basis of information avaxlabh in the lc.:nal entries 
of the accessed words. As Forster (1979) has pointed out. this may require a reconceptuahzation 
of the kind of information that is thought to be contained in the lexicon. The autoniaticit y of 
sentepec context effects — especially as evidenced by their stability over SOAs may demand such 
a reconceptualization. 

In the model advocated here, sentence context effects arise because of the integrity of the 
language processor, w! ' :\\ cannot short circuit its own style of normal language comprehension. 
That is, the decision making device ordinarily must use the outputs of .ill three subprocessors in 
older to understand sentences. Negative bias from any level may be "a signal that perception 
or comprehension has failed and that some reanalysis is called for" (Fischler & Bloom* 1979, 
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p. 224; see also Kinoshita, Taft, & Taplin, 198S). For example, one might be alerted to an 
unfamiliar or inappropriate word or to a questionable syntactic construction (e.g., is a double 
negative intentional?). These effects are decidedly post-lexical bul they are no less automatic 
because of it. 
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THE USE OF MORPHOLOGICAL KNOWLEDGE IN SPELLING 
DERIVED FORMS BY LEARNING-DISABLED AND NORMAL 
STUDENTS 



Joanne F. Carlislef 

Abstract. Currently popular systems for classification of spelling 
words or errors emphasize the learning of phoneme-grapheme co re- 
spondences and memorization of irregular words, but do not take into 
account the morphophonemic nature of the English language, Th*s 
study is based on the premise that knowledge of the morphological rules 
of derivational morphology is acquired developmental^ and is related 
w the spelling abilities of both normal and learning- disabled (LD) stu- 
dents. It addresses three issues: 1) how the learning of derivational 
morphology and the spelling of derived words by LD students compares 
to that of normal students; 2) whether LD students learn derived forms 
rulefully; and 3) the extent to which LD and normal students use 
knowledge of relationships between base and derived forms to spell de- 
rived words (e.g., "magic" and "magician"). The results showed that 
LD ninth graders 9 knowledge of derivational morphology fell between 
that of normal sixth and eighth graders, following similar patterns of 
mastery of orthographic and phonological rules, but that their spelling 
of derived forms was equivalent to that of fourth graders. Thus, they 
know more about derivational morphology than they use in spelling. 
In addition, they were significantly more apt to spell derived words 
as whole words, without regard for morphemic structure, than even 
the fourth graders. Nonetheless, most of the LD spelling errors were 
phonetically acceptable, suggesting that their misspellings can not be 
attributed primarily to poor knowledge of phoneme-grapheme corre- 
spondence. 

Introduction 

In order to gain insight into the nature of spelling abilities and disabilities, we must have an 
approach to classifying words and/or spelling errors that reflects a model cf the spelling process 

f Also American International College 
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and hypotheses about the nature of spelling disabilities. Currently, the irost popular model of 
the process of spelling includes two distinct systems for spelling a word — a "whole word" system, 
which is dependent on recall of the word as a gestalt, and a "correspondence" system, which is 
dependent on knowledge of the rulefvl relationships between sounds and letters. While this dual- 
system model, which can be termed a "phonetic"/"nonphonetic" model, has provided insight into 
certain aspects of spelling disabilities, it does not take into account the morphemic structure of 
words. For a complete understanding of the linguistic deficits of disabled spellers, we must take 
into consideration students' acquisition of morphological knowledge, as well as their ability to use 
this knowledge in spelling. 

Described by a variety of terms (e.g., "regular" and "irregular," or "predictable" and "un- 
predictable"), the "phonetic"/"nonphonetic" approach has become the theoretical basis for ex- 
tensive research on and diagnostic analysis of spelling disabilities (Barron, 1980; Boder, 1973; 
Boder k Jarrico, 1982; Camp k Dolcourt, 1977; Carpenter k Miller, 1982; Cook, 1981; Frith, 
1980; Goyen and Martin, 1977; Holmes k Peper, 1977; Jorm, 1981; Moats, 1983; Nelson, 1980; 
Sweeney k Rourke, 1978; Whiting k Jarrico, 1980). Although the results of these investigations 
are not completely consistent (see Holmes k Peper, 1977), they have resulted in a consensus 
that learning-disabled or dyslexic spellers are apt to have a primary deficit that corresponds to 
one of the two systems — "phonetic" spelling or memory for "nonphonetic" words. Perhaps as a 
result, the "phonetic"/ "nonphonetic" distinction has been used as the basis for diagnostic tests 
that have become popular in the last ten years, including Larsen and HammilTs Test of Written 
Spelling (1976) and Boder's Test of Reading-Spelling Patterns (Bodei k Jarrico, 1982). Boder 
(19?3) argues that the prevalence of one of the two error types ("phonetic" and "nonphonetic") 
can be used to classify dyslexics into subgroups. By this system, spellers who cannot render 
words with phonetic accuracy are classified as "dysphonetic" and those who do not recall the 
configuration and characteristic visuai features of words are classified as "dyseidetic", although 
it is possible to have both kinds of deficit and be placed in a "mixed" category. 

This method of diagnosing types of disabled spellers has several important shortcomings. 
First, the strict dichotomy requires that all words (or misspellings of words) be classified as either 
"phonetic" or "nonphonetic." Because any word that is not completely regular phonetically must 
be considered "nonphonetic," the class of words considered "nonphonetic" becomes very large 
and heterogeneous. In the Test of Written Spelling (Larsen k Hammill, 1976), "myself" and 
"everyone" are included in the list of "Unpredictable" words, even though each is a compound 
of two very common morphemes, "my" and "self," "every" and "one." In fact, these two words 
pose quite a different challenge for young spellers than other "Unpredictable" words on the same 
list, such as "music" and "campaign." 

Second, the phonetic approach misrepresents the nature of our writing system. "Phonetic" 
spelling places emphasis solely on the phoneme as the unit of language, and analysis of words or 
spelling errors focuses on the letter or letters that can be used to spell each phoneme accurately. 
While knowledge of sound-to-letter correspondences and memorization of "nonphonetic" words 
are necessary, these are not the only sources of knowledge spellers need to bring to the task. For 
accurate spelling children must also use knowledge of grammatical structure and knowledge of 
orthographic and morphological patterns and rules, even in the first few years of school (Chomsky, 
1970; Hanna, Hodges, k Hanna, 1971; Marino, 1979; Schwartz & Doehring, 1977). 
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Of specific concern here is the fact that the "phonetic"/ "nonphonetic" system ignores the 
large role that morphemic structure plays in the formation of English words. The nature of our 
language is such that phonemes and morphemes are intricately embedded, so that the English 
language is accurately described as "morphophonemic" (Chomsky & Halle, 1968). In fact, analysis 
of errors at the "letter level" must be sensitive to students' knowledge of the structure of words to 
be meaningful. For example, in an analysis of errors made on "ie" words by junior high students 
(Carlisle & Liberman, 1983), transpositions of "ie" were found to be very common in words like 
"chief" and "belief," but nonexistent in words like "babies" or "parties." The reason may be 
that the linguistic role of "ie" in these words is quite different. The "ie" in "chief" falls within 
a single base morpheme, whereas the "ie" in "babies" occurs at the morphemic boundary, the 
point at which the plural marker "s" is added to the bace "baby." Even the poorest spellers did 
not spell "babies" "babeis"; their misspellings of the "ie" were commonly "babes" and "babys". 
Ordinarily, analysis of "letter level" errors does not take into consideration students' knowledge 
of the morphemic structure of words. 

While researchers believe that students must use morphological knowledge to be successful 
in reading and spelling (Chomsky, 1970; Hodges & Rudorf, 1966; Liberman, 1982; Venezky, 1970; 
Venezky & Weir, 1966), we know little about how children learn to use morphological knowledge, 
particularly in spelling. We know more about how inflected forms are learned than how derived 
forms are learned. By the age of seven, children generally use inflected forms rulefully in speaking 
(Berko, 1958; Selby, 1972). These forms include the verb tense markers (e.g., "-ed," "-ing"), 
the "s" plural and possessive markers, and so on. The derived forms are learned later and more 
slowly, starting with the more common regular forms such as "foggy" (the adjectival form of "fog") 
and "slowly" (the adverbial fovm of "slow") and progressing to forms that undergo phonological 
changes (as in "magic" and "magician") (Berko, 1958; Derw^ng, 1976* Derwing & Baker, 1979). 

Learning derived forms is more difficuu than learning inflected forms for several reasons. 
One reason is that inflected forms are more common, perhaps because they are necessary for the 
grammar of the langua^*. Learning inflected forms is a more integral part of language acquisition 
than learning derived forms. In addition, while the phonological shifts from base to derived forms 
are often ruleful (Chomsky & Halle, 1968), they are complex and sometimes seemingly a A bitrary. 
For example, "deep" becomes "depth," but "steep" does not become "stepth." Furthermore, 
word-specific knowledge seems to play a larger role in learning derived forms than in learning 
inflected forms (Klima, 1972; Smith & Sterling, 1982). Such knowledge includes the particular 
suffix used to form a given derived word. For example, formation of a noun from an adjective 
may be accomplished by adding on "-ness," "-ment," or "-ity." Sometimes two grammatically 
identical forms exist in the language, varying only slightly in meaning (e.g., "bountiful" and 
"bounteous"). Linguistic rules do not consistently specify the exact forms of derived words found 
in the language. 

Learning to read is believed to help the child acquire the derived forms as patterns or word 
families. The orthography preserves the identity of the word, even when phonological changes 
take place (e.g., "equal," "equality"). In addition, some orthographic shifts can be learned as 
patterns (e.g., "divide" and "division," "decide" and "decision") (Chomsky, 1970; Tenipleton, 
1980). It is not surprising, then, that good readers have been shown to have a more thorougii 
knowledge of derived forms than poor readers (Barganz, 1971; Freyd & Baron, 1982). 
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Children's ability to spell derived forms has received less attention. We know that children 
begin to learn the patterns of morphemically complex words in their first years in school (Schwartz 
& Doehring, 1977). For instance, as early as first grade, linguistically mature students spell words 
that sound alike (e.g., "wind" and "pinned") in ways that reflect differences in morphological 
structure (Rubin, 1984). Still, while these early studies suggest that spelling of inflected forms is 
learned rulefully, they do not speak directly to the issue of how children go about spelling derived 
forms. It is possible that derived forms are spelled as whole words, without reference to their 
morphological structure. Support for this position comes from Sterling (1983), who has found 
patterns of errors indicating that inflected forms are learned rulefully, while derived forms are 
learned as whole and independent words. The alternative is that some spellers, at least, spell 
derived words by making use of knowledge of the morphemic structure of the word. We might 
suspect that better spellers would make superior use of knowledge of derivational morphology than 
poor spellers. There is some evidence to support this hypothesis. Several researchers (Fischer, 
Shankweiler, & Liberman, 1985; Templeton, 1980; Templeton & Scarborough-Franks, 1985) have 
provided evidence that good spellers, particularly at high school and college levels, have superior 
knowledge of phonological and orthographic rules. 

Poor spellers may lack linguistic knowledge, but their weaknesses are not just at the level of 
representing phonemes. We have evidence that poor spellers spell inflected and derived words with 
a high degree of phonetic accuracy but have difficulty adding suffixes to base words accurately 
(Carlisle, 1984). We do not know whether they lack morphological knowledge or simply the 
ability to use that knowledge in spelling. In a study of the spelling of good and poor junior-high 
spellers, some students wrote "easally" for the word "easily," given the sentence, "Our team won 

the race ." And some wrote "finely" for "finally," given the sentence, "I have finished my 

lesson." We do not know whether these students know that "final" is the base word of "finally" or 
that "ease" and "easy" are in the same word family. In fact, to understand such spelling errors, 
we must know whether students at this level lack knowledge of morphological relationships, or 
whether they do not think to use this knowledge in spelling derived words. 

The design of the present study reflects the belief that in order to understand the full range 
of spelling capabilites of disabled spellers, we need to learn more about the knowledge of the 
morphemic structure of both normal and disabled spellers. In an earlier study, students in the 
fourth, sixth, and eighth grades were selected to investigate the normal developmental learning of 
derivational morphology and the ability to spell derived forms. For the present study, a group of 
learning-disabled ninth-grade students with spelling disabilities were selected for comparison to 
the normal students. The ninth-grade level was chosen in light of the findings of previous studies 
indicating that dyslexic or learning-disabled students were commonly three to five years delayed 
in their acquisition of spelling skill and morphological knowledge (Moats, 1983; Wiig, Seinel, & 
Grouse, 1973). Thus, it was estimated that the ninth-grade LD students might developmental^ 
resemble the fourth or sixth graders in the acquisition of derivational morphology and the spelling 
of derived words. 

Initially, a study was undertaken to investigate 1) the developmental leaning of derivational 
morphology and its rule systems (phonological and orthographic rules) by normal children in 
grades four, six, and eight and 2) the extent to which these students use knowledge of morpho- 
logical relationships in their spelling of derived words. The purpose of the present study was 
to determine the extent to which LD students' learning of derivational morphology and spelling 
of derived words differed from that of the normal students. This study was designed to address 
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three questions: First, do LD students know and use rules of derivational morphology in the saint 
way as do peers at a similar level of spelling ability? Second, do the LD students appear to be 
learning the underlying phonological and orthographic rules of derivational morphology? And, 
third, do LD students use their knowledge of the morphemic structure when they spell derived 
words? 



The description of the present study has included the normal groups (fourth, sixtb, and eighth 
graders) of the first study (Carlisle, 1985) for purposes of comparison. The study was designed 
to determine whether learning-disabled (LD) students showed similar or different patterns of 
learning derivational morphology and spelling derived forms. 



The normal students were fourth, sixth, and eighth graders who were members of classes 
studying reading or language arts in a rural school system. There were 22 fourth graders, 22 
sixth graders, and 21 eighth graders; all students were reported by their teachers to have normal 
intelligence. The LD students were ninth graders who attended a rural private high school with a 
specific program of remedial training for LD students. The 17 students who participated were all 
previously evaluat .1 and determined to have specific learning disabilities in reading and written 
language skills. The mean intelligence quotient of these students was reported by the school to 



The Wide Range Achievement Spelling subtest (Jastak & Jastak, 1978) was used to compare 
the groups on spelling ability. As Table 1 shows, the LD ninth graders' mean score closely resem- 
bled that of the fourth graders. The LD ninth graders' performance did not differ significantly 
from that of the fourth graders, t(37) = 0.08, p = 0.937, but did differ significantly from that of 
the sivth gra^- :s, <(37) = 2.14,/? < .05, and the eighth graders, f(36) = 8.99, j> < .001. 



Performance on Wide Range Achievement Test (WRAT) Spelling by Grade Level 



Method 



Subjects 



be 107. 



Table 1 



Mean GE (and SD) Subtest, Range 



4N 



5.9 3.9 - 8.1 
(1.0) 

6.7 3.9 - 8.9 
(1.4) 

9.4 6.7 - 10.9 
(1.3) 



6N 



8N 



OLD 



. r > 9 3.f> - HA 
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Instruments 

The following tests were administered: 

1) The Wide Range Achievement Test (WRAT), Spelling subtest (Jastak & Jastak, 1978): 
This standardized spelling test was used to determine the spelling abilities of the four groups and 
to determine the validity of the experimental Spelling Test. The correlation between performance 
on the WRAT Spelling Test and the Spelling Test, Derived Forms subtest, was .74(p < .001) for 
the fourth, sixth, and eighth graders. 

2) The Test of Morphological Structure (TMS): This is a test of oral generation designed 
to assess knowledge of derivational morphology. It has two subtests, each with 40 items. The 
Derived Forms subtest requires that the student provide the appropriate derived form, given the 
base form of the word and a short sentence. The Base Forms subtest required the student to 
supply the base form, given the derived form and a short sentence. In both cases, the word the 
student supplied was the final word of the sentence. For example, the first item on the Derived 
Forms subtest is: "Warm. He chose the jacket for its — ." The target response is "warmth." 
The first item on the Base Forms subtest is: "Growth. She wanted the plant to — ." The target 
response is "grow." 

The words on this test reflect four types of relationship in the transformation from base to 
derived forms. These are as follows: No Change in phonology or orthography (for example, "enjoy 
to enjoyment"); Orthographic Changeonly (for example, "sun" to "sunny" or "rely" to "reliable"); 
Phonological Change only (for example, "magic" to "magician" or "sign" to "signal"); and, Both 
Changes, orthographic and phonological (as in "deep" to "depth" or "decide" to "decision") 
(see Carlisle, 1985, for further description of the construction of this test). The ten base words 
included under each type of transformation were equated for word length and word frequency on 
both subtests of the TMS (Base Forms and Derived Forms) (Carroll, Davies, & Richman, 1971). 
The same procedure was used to equate the derived words under each type of transformation 
on each TMS subtest for word length and word frequency. The test was administered by a 
tape-recording of a native American male speaker. 

3) The Spelling Test (ST): This experimental test is a test of dictated gelling, consisting 
of two pc.rts — a Derived Forms and a Base Forms subtest, each with forty items. The student 
was presented with the word, a sentence containing the word, and then the word again. For 
example, the first item of the Derived Forms subtest is: "Explanation. The explanation was long. 
Explanation." 

The words on ine ST are the same words (base and derived forms) that comprise the Derived 
Forms subtest of the TMS; altogether there are forty pairs of words. Including pairs of base and 
derived forms allows for analysis of students' use of morphological knowledge in spelling. If a 
derived word is spelled by reference to its morphemic structure, a prerequisite must be the ability 
to spell the base form correctly. Alternatively, if the spelling of each of the two forms (base and 
derived) is learned independently (i.e., as whole words), we would expect that in some cases the 
derived form would be speller, correctly while the base form would be misspelled. Thus, the ST 
was constructed to examine the extent to which successful spelling of a base form was related to 
successful spelling of its derived counterpart. The test was administered by a tape-recording of a 
native American male speaker. 
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4) The Test of Suffix Addition (TSA): This experimental test is a paper-pencil task that 
required the students to combine a base word and a suffix, following the rules that govern the 
addition of suffixes to words. The test was designed to explore students' knowledge of the ortho- 
graphic transformations between base and derived words. There are 30 items on the test. The 
base words are nonsense words, made by changing one consonant or consonant blend of a real 
word. The suffixes are real. For example, the first item is as follows: 1. dun + y = " Non- 
sense words were used in order to have a relatively pure test of the students' ability to apply suifix 
addition rules. The students could not simply know how to spell the whole word. Knowledge of 
three orthographic rules was evaluated — those governing the addition of suffixes to words ending 
in silent "e," to words ending in "y," and to words ending in a single consonant. 

Procedures 

In both phases of the study, the students were administered the Wide Range Achievement 
Test (WRAT), Spelling subtest, and the three experimental tests described above — 1) the Test 
of Morphological Structure (TMS), 2) the Spelling Test (ST), and 3) the Test of Suffix Addition 
(TSA). First, the WRAT, Spelling subtest, and the ST, Derived Forms subtest, were administered 
to each grade-level group. Between two to three weeks later, the ST, Base Forms subtest, and 
the TSA were administered to each grade-level group. (The Derived Forms subtest of the ST 
was administered before the Base Forms subtest so that the students would not be given the 
advantage of practice in spelling the base forms prior to spelling the derived forms.) Between one 
and two weeks later, the TMS was administered to each student individually. 

Results 

Performances of LD and Normal Students on the Experimental Tests 

The first research question asked how the learning of derivational morphology and spelling 
of derived words by LD ninth-graders compared with that of normal students. This question 
was addressed by examining the students' performances on the Test of Morphological Structure 
(TMS), the Spelling Test (ST) and the Test of Suffix Addition (TSA), as shown in Table 2. 
On the TMS, the normal students showed clear developmental trends in their generation of 
the base and derived words, while the LD nirth graders' performance fell between the sixth- 
and eighth-grade levels. An analysis of variance showed significant differences between the 
groups on both the Derived Forms subtest, F(3,78) = 18.914, p < .001. and the Base Forms 
subtest, F(3,78) = 16.879, p < .001, On the Base Forms subtest posi hoc analysis (Scheffe, 
p < .05) revealed that significant differences existed between all of the groups (the direction of 
the difference is indicated by the symbol <) : AN < 6N < 9LD < 8iV. On the Derived Forms 
subtest the LD students' performance did not differ significantly from that of the sixth graders: 
AN < 6N = 9LD < 8N (Scheffe, p < .05). 

Developmental trends in the abilit} to spell base and derived forms were evident from the 
normal students' performance on the two subtests of the ST. while the performance of the 
LD ninth graders resembled that of the fourth graders (see Table 2). An analysis of vari- 
ance showed significant differences in performance of the groups on the Base Forms subtest, 
F(3,78) = 20.424, p < .001, and on the Derived Forms subtest, F(3,78) = 27.963, p < .001. 
A comparison of the performance of the groups (Scheffe, p < .05) indicated that on both the 
Base Forms subtest and the Derived Forms subtest, the LD ninth graders' performance did not 
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Table 2 



Performance on Experimental Tests of Morphological Structure(TMS), 
Spelling (ST), and Suffix Addition (TSA): Means and SDs 



TMS* 




ST* 


TSA** 




Derived 


Base 


Derived 


Base 




4N 27.0 


30.8 


14.5 


24.9 


16.0 


(5.6) 


(6.9) 


(9.7) 


(9.3) 


(4.0) 


6N 32.2 


35.6 


26.0 


34.2 


17.9 


(3.5) 


(3.7) 


(7.5) 


(4.1) 


(3.3) 


8N 36.0 


39.4 


34.4 


38.2 


21.0 


(2.1) 


(0.7) 


(5.3) 


(3.0) 


(3.7) 


9LD 33.0 


37.8 


16.8 


28.1 


17.5 


(3.2) 


(2.1) 


(7.1) 


(5.8) 


(4.9) 



* Maximum possible = 40 
**Maximum possible — 30 



differ significantly from that of the fourth graders: 47V = 9LD < 67V < 87V. Performance on 
the TSA indicated a somewhat different developmental trend. Although an analysis of variance 
showed significant difference between the groups, F(3,78) = 6.017, p < .001, the fourth graders' 
performance did not differ significantly from that of the sixth graders, and the LD ninth graders 
did not differ significantly from that of either the fourth or sixth graders (Scheffe, p < 05). Thus, 
knowledge of the rules that govern the addition of suffixes improved significantly onI> between 
the sixth and eighth grades:. 47V = 9LD = 67V < 87V. 

Discriminating the Groups by the TMS and ST Subtests 

While the above analyses indicated the group differences on the Derived Fcrms and Base 
Forms subtests of the TMS and ST, they left open the question of which subtests best differentiate 
the groups. To address this question, the students' scores on these four subtests were subjected to 
a stepwise discriminant function analysis. Table 3 shows the standardized canonical coefficients 
for the two significant functions that were generated. For the firs r unction, the coefficients were 
high for the subtests that measure morphological knowledge (the TMS Base Forms Pad Derived 
Forms and the ST Derived Forms); this function acc< ^uted for 71.52% of the variance (p * .001 ). 
The second function, explaining an additional 24.2 p of the variance, for a total of 95.73/7, was 
barely significant (p - 0.05). The highest coefficient was on the TMS, Base Forms subtest. The 
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first function reflects gr^u^ differences in knowledge of derived morphology. The second function 
may reflect word knov r !ed^,e or vocabulary development. 



Table 3 

The Standardized Canonical Coefflciencts of the Stepwise Discriminant 
Function Analysis of the Subtests of the Test of Morphological Structure (TMS) 

and the Spelling Test (ST) 



Subtests* 

ST, Derived 
TMS, Base 
TMS, Derived 
ST, Base 



Function 1 

0.95735 
-0.57937 
0.70170 
0.01884 



Function 2 

-0.62458 
1.16453 
0.09103 

-0.14143 



''Subtests are given in order of entry in the analysis. 



Ruleful Learning of Derivational Morphology 

The second question addressed by t! study was whether LD students' learning of deriva- 
tional morphology reflects the ruleful natur- of the morphological transformations between base 
and derived forms. To investigate this issue, performance of the groups was analyzed on the 
ba^is of the four types of transformation from base to derived forms. The four types of trans- 
fo nations between base and derived forms — "No Change" (NC), "Orthographic Change Only" 
(OC), "Phonological Change Only" (PC), and "Both Orthographic and Phonological Changes" 
(BC) — were equally represented on the TMS subtests. 

An analysis of variance showed that the four groups differed significantly in their performance 
on each of the transformations on the TMS Derived and Base subtests; the univariate F ratios 
were all highly significant (see Table 4). Of particular interest is the fact that the pattern of 
performance across wrrd types was very similar for the four groups, as can be seen in Figure 1. 
These graphs iiiustrate several results of note. First, the students consistently made the most 
errors on words that undergo phonological change or both phonological and orthographic changes. 
Second, the LD ninth graders' pattern of performance on the different transformations was quite 
similar to that of the normal students, indicating a lag in their mastery of the transformations, 
but not a noticeably different pattern in their learning of the four types of transformations in 
derivational morphology. 

The Spelling of Base-Derived Word Pairs 

The third question addressed by this study was whether LD students spell derived words with 
reference to their morphemic structure. Toward this end, the spelling of the base and derived word 
pairs on the ST were scored according to the four possible patterns of performance: Both Incorrect 
(e.g., "equl" and "eqalty"), Base Correct/Derived Incorrect (e.g., "equal" and "eqalty"), Derived 
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Tab'? 4 

Univariate F Ratios of the Transformations on the Base Forms and 
Derived Forms of the Test of Morphological Structure (TMS) 

F-Ratio** 



Base Forms 

No Change 7.788* 

Orthographic Change 6.559* 

Phonological Change 15.300* 

Both Change 11.850* 

Derived Forms 

No Change 9.719* 

Orthographic Change 9.224* 

Phonological Change 9.593* 

Both Change 19.560* 



*p < .0005 
**With 3 and 78 degrees of freedom. 



TMS. BASE FORMS 

5 
4 

#of 3 
ERRORS 2 



NC OC PC BC 

TMS, DERIVED FORMS 

7 
6 
5 

# of 4 
ERRORS 3 

2 
I 

0 

NC OC PC BC 

Figure 1. Mean errors on four types of transformation — No Change ( NC), Ortliograph Change (OC), Phonological 
Change (PC), and Both Change (BC)— on the Test of Morphological Structure (TMS) Base Forms and Derived 
Forms subtests. 
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Figure 2. Spelling performance on pairs of base and derived words (expressed as % of opportunity). 



Correct/Base Incorrect (e.g., "equality" and "equl"), and Both Correct ("equal" and "equality"). 
The proportion of overall performance for each pattern is given for each of (he four groups in 
Figure 2. 

Of particular interest are two of the categories — Base Correct/Derived Incorrect and Derived 
Correct/Base Incorrect, as they suggest the extent to which knowledge of the spelling of (he base 
form is related to knowledge of the spelling of the derived form. An analysis of variance showed 
that the groups differed significantly on these two measures (Base Correct/Derived Incorrect, 
F(3,78) = 24.414, p < .001; Derived Correct/Base Incorrect, F(3,78) = 11.303, p < .001). Paired 
comparisons (Scheffe p < .05) indicated that the LD ninth graders had significantly more pairs 
that fell in the Base Correct/ Derived Incorrect category than any of the other groups: 9LD > 
4N > 6// > 8N . Similarly, the LD ninth graders also had significantly more pairs that belonged 
to the Derived Correct/Base Incorrect pattern: 9LD > 4N = 6N > SN . Together, these findings 
indicate that the LD ninth graders more frequently spelled correctly ONE of (he pair (base or 
derived word) than do the normal students, including the fourth graders. 



Discussion 



Comparison of LD and normal students' performances on the tests of nioi phological knowl- 
edge and spelling of base and derived forms lias confirmed several of the initial expectations. 
First, youngsters normally learn a great deal about dei national moipliolog) between the fourth 
and eighth grades. The performance of the ninth-grade LD students suggests that while they 
are experiencing a lag in their mastery of derivational morphology, their pattern of learning the 
underlying phonological and orthographic rules resembles that of the normal students. Second, 
while both normal and LD students know more about morphological relationships that they use 
in spelling derived forms, the gap is more pronounced for the LD students. The normal students' 
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spelling of base and derived word pairs shows that they spell many derived words by using knowl- 
edge of morphemic structure. This is not the case for the LD students. However, a post hoc 
examination of the LD students* spelling errors suggests that their difficulties spelling derived 
forms cannot be attributed solely to lack of mastery of phoneme-grapheme correspondence rules. 

The Learning of Derivational Morphology by Normal and LD Students 

Understanding the patterns of performance by the normal students provided a reference by 
which 10 evaluate the performance of the LD students. Clear developmental trends were evident 
in both the oral generation of derived forms and the spelling of base and derived forms. Several 
points of particular interest might be emphasized here. First, on the Test of Morphological 
Structure (TMS), the students in all four groups consistently had an easier time when they were 
given the derived form (e.g., "growth") and were asked to supply the base form (e.g., "grow") 
than when they were given the base form (e.g., "warm") and were asited to supply the appropriate 
derived form (e.g., "warmth"). Extracting the base form is easier than generating the derived 
form. One of the central differences between the two tasks is that generating the derived form 
required some word-specific knowledge. Derivational rules cannot supply thi.« particular kind 
of knowledge. Specific word knowledge helps us know that "equality," not "equalness," is the 
noun form of "equal." It is not surprising that the students' ability to generate the correct 
derived form lagged behind their ability to extract the base word. In fact, this pattern confirms 
our impression at the outset of this study that word-specific knowledge plays a large role in 
the level of learning of derivational morphology. It also shows, however, that rules governing 
the relationships between base and derived forms are learned. A second trend of interest is that 
spelling base and derived forms consistently lagged behind the ability to generate the same words. 
Spelling is evidently the more difficult task. As we discussed earlier, spelling draws on knowledge 
of sound-letter correspondences, syntactic roles, and orthographic rules as well as on knowledge 
of the morphology. 

The particular concern of the present study is hov the LD ninth graders compare to their 
normal peers in mastering derivational morphology and spelling derived forms. First, the LD 
ninth graders fell between the sixth and eighth graders on the TMS, resembling most closely 
the eighth graders in knowledge of base forms and the s'xth graders in knowledge of the derived 
forms. In contrast, on the Base and Derived Spelling n est (ST) subtests, the LD ni ith graders 
performed very much like the fourth graders. Thus, while they evidently are delayed in their 
acquisition of morphological knowledge, they arc mere seriously delayed in their mastery of the 
spelling of both base and derived words. 

Ruleful Learning of Derivational Morphology 

Assessing the nature of the students' morphological knowledge was cairied out to determine 
the extent to which learning about derivational morphology is ruleful. This analysis was an in- 
vestigation of the number of errors on each type of transformation between base and derived 
forms— ''No Change,' 1 "Orthographic Change." ''Phonological Change.'' and "Both 'Ganges." 
Performances on both subtests of the TMS showed that for ail of the groups, the number of 
eirors increased on the more complex transformations — that is, more errors were made on those 
word pairs that undergo phonological or both phonological and orthographic changes than on 
words that undergo no change at all or only an orthographic change. The error pattern across 
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transformations is consistent on each grade level; there is no interaction between type of trans* 
formation and grade level. If ruleful learning did not take place, we would expect more or less 
equal numbers of errors on the four types of transformations by group and by subtest. Thus, the 
marked consistency of the pattern is a strong indication that the learning derivational morphology 
reflects the relative difficulty of learning the orthographic and phonological rules. The J Minger 
students know many more "No Change" pairs than "Phonological Change" pairs. Where both 
phonological and orthographic transformations occur between base and derived forms, learning 
of the relationship betwerj. oase and derived forms is not complete even by the eighth grade. 

Spelling Base and Derived Word Pairs 

The performance of the LD ninth graders resembled that of the fourth graders on the spelling 
of both the base and derived words. Examination of the spelling of the pairs of base and derived 
words on the ST showed that the normal students used knowledge of word structure in spelling 
the derived forms, but that the LD students were less apt to use such knowledge in their spelling 
of derived forms. When the pairs of words (each base and its derived forms) were examined for 
error patterns (see Figure 2), on* pattern emerged for normal students at all three grade levels. 
The two components of this pattern were that 1) the higher the grade level, the fewer errors 
on both members of the pair, base and derived, and 2) the derived form was seldom spelled 
correctly if the base word was misspelled; or, put another wry, t* ; students rarely spelled only 
the derived word correctly. Clearly, for normal students, knowing how to spell the base form (e.g., 
"equal") probably precedes and aids in learning to spell the derived form (e.g., "equality"). For 
these students, then, knowledge of the morphemic components does appear to be used in spelling 
dictated words. 

In coutiast, the LD ninth graders were more apt to spell only one of the pair correctly, be 
it the base form or the derived form. This means that in some cases, the base word was spelled 
incorrectly (e.g., "glorry"), but the derived word was spelled correctly (e.g., "glorious"). The fact 
that the number of base incorrect/derived correct errors is significantly greater for ninth-grade 
LD students than for normal fourth graders suggests that they were more apt to spell derived 
forms as whole words, without regard for the relationship to the base form or the morphemic 
transformation. Thus, even though the LD ninth graders' overall performance on the ST was at 
the same level as the fourth graders', they nonetheless showed less evidence of using morphological 
knowledge in' spelling derived forms. 

It seemed important to consider the possibility that the LD students' spelling errors could 
be categorized in terms of the "phonetic" /"nonphonetic" dichotomy that is currently the most 
popular system for specifying spelling disabilities . A post -hoc tabulation of "very spelling of 
every derived word on the ST, Derived Froins subtest, was carried out at each °;rade level. The 
misspellings were then analyzed by two judges to determine whether th<* misspellings were reason- 
able phonetic versions of the dictated word. The general finding was that only a small proportion 
of errors could be labeled phonetically unacceptable As an example, Table 5 shows ore of the 
"Phonological Change" words, "magician." B\ examining all of the versions of spelling this word, 
we see that almost all of the errors reliect difficulties learning the correct spelling of the suffix. As 
we noted earlier, the LD students were roughly equivalent to sixth graders in their knowledge of 
morphemic structure, but the misspellings illustrate that they were less able to use this knowledge 
in spelling. All but about four of the LD students' misspellings must be considered phonetically 
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Table 5 

All Spellings of the "Phonological Change" Word, "magician" 



Grade: 4N 




d AT 




Q AT 








(n=22) 




(n=22) 




(nr=21) 




(n=17) 




magition 


5 


magician 


17 


magician 


16 


magition 


3 


maffician 


3 


macican 


2 


magican 


2 


magician 


2 


magican 


3 


magision 


2 


magision 


2 


magicion 


2 


migion 


1 


magition 


1 


magition 


1 


magishion 




migishon 


1 










migition 




mjshier 


1 










magic at ion 




mudishon 


1 










midican 




magish 


1 










magish an 




magiton 


1 










meniton 




magishion 


1 










migertion 




sirajison 


1 










niajion 




machishan 


1 










machishon 




micgen 


1 










m— * 




macian 


1 















acceptable versions of the word. Thus, it seems that this group of LD students has acquired ba- 
sic knowledge of sound-letter correspondences. Still, as the sixth and eighth graders' spelling of 
"magician" indicates, older and more capable spellers did not opt for the basic phonetic spellings 
(e.g., "shun" for "cian" in "magician"). They seem to have learned to override the process of 
direct phonetic representation when they have acquired productive understanding of morphemic 
structure of the words they spell. In contrast, when phonological transformations occur, the LD 
students were more apt to spell words phonetically, often without awareness of the relationship 
to the spelling of the base word. 

In summary, this investigation of the spelling of derived words has found a noteworthy dis- 
crepancy between the LD students' ability to generate orally derived forms and their ability to 
spell derived forms. Spelling is clearly the more difficult task of the two, not only for the LD stu- 
dents but for the normal students as well. At all levels the students appear to know more about 
the morphemic structure of words than they use in their spelling. However, the gap between 
knowing derived words in spoken language and spelling them correctly is more pronounced for 
the LD students than it is for normal fourth, sixth, and eighth graders. This gap cannot solely 
be attributed to lack of understanding of basic phoneme-grapheme correspondences. Their mis- 
spellings, as a rule, are viable phonetic representations. Instead, they appear to lack awareness of 
the presence of base torms within derived counterparts, and they lack specific knowledge about 
how to spell suffixes and how to attach suffixes to base words correctly. 
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THE DEVELOPMENT OF MORPHOLOGICAL KNOWLEDGE 
IN RELATION TO EARLY SPELLING ABILITY 



Hyla Rubinf 



Abstract. This study ao^sssed the morphological knowledge of kinder- 
garteners and first graders in relation to their early spelling ability. 
Morphological knowledge was investigated because, in order to spell, 
children need to understand that words are composed of morphemes 
and phonemes, and because poor spellers have particular difficulty 
with inflected forms of words.. Kindergarteners and first graders were 
grouped by their implicit understanding of morphology and were given 
tests of dictated spelling and morphological analysis. First graders 
with poor morphological knowledge omitted more inflectional mor- 
phemes in spelling and were less able to identify base morphemes in 
spoken words thav kindergarteners and first graders with higher lev- 
els of implicit morphological knowledge. The results demonstrate the 
importance of morphological knowledge in, the development of spelling 
proficiency. 

INTRODUCTION 

Children who demonstrate learning problems characteristically make errors when reading 
and spelling inflected and derived forms of words. They tend to omit and substitute inflectional 
markers and to substitute base words for derived words, or one derived form of a word for 
another. Although these errors are frequently documented in clinical case reports, there is little 
experimental research concerning morphemic errors in written language. The studies that do 
exist demonstrate that children with learning problems make more of these errors in spelling 
than other children (Anderson, 1982; Moran, 1981). However, possible reasons for the occurrence 
of these errors have not been addressed. 

The basis for such errors in spelling might fall into one of two categories. On the one hand, 
they might represent part of a general tendency to misspell words. If this is the case, omissions of 
inflectional endings, for example, might be but one instance of a more pervasive pattern of final 
consonant omissions. On the other hand, they might reflect an underlying deficit in morphological 
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knowledge. If that is the case, children who make such errors in spelling would be expected to 
perform poorly in their attempts to use morphological rules in spoken language or to analyze the 
internal structure of words. 

Although the relationship between morphological knowledge and spelling ability has not 
been examined directly, there is good reason to anticipate that children who make morphemic 
errors in spelling are indeed deficient in their underlying morphological skills. Several studies 
have demonstrated that children with reading problems have difficulty applying morphological 
rules to new words (Brittain, 1970; Doehring, Trites, Patel, & Fiedorowicz, 1981; Vogel, 1975, 
1983; Wiig, Semel, & Crouse, 1973). In all of these studies, morphological knowledge has been 
assessed by an elicited spoken language task that requires the application of basic inflectional 
and derivational rules of morphology to nonsense base words (Berko, 1958; Berry, 1966). This 
method is used in order to determine that children are actually applying the morphological rules 
that they have mastered and axe not just producing memorized vocabulary items. It has been 
found that normally developing children master these rules between the ages of four and seven 
(Brown, 1973; deVilliers & deVilliers, 1973; Selby, 1972; Teroplin, 1957). In contrast, children 
with learning problems develop morphological knowledge more slowly, although they are found 
to follow the same sequence in their rule acquisition. 

Although it has been demonstrated that children grouped by reading ability differ signifi- 
cantly in their use of inflectional morphemes, as measured by the Berko procedure, research has 
not yet examined whether morpheme use is directly related to other linguistic skills or why these 
relationships might exist. Since children with learning problems show a strong tendency to make 
morphemic errors in spelling as well as in reading, it is of particular interest to determine if 
there is a relationship between morphological knowledge and spelling ability. Since the English 
orthography is morphophonemic, like the spoken language it represents (Liberman, Liberman, 
Mattingly, & Shankweiler, 1980), spelling requires that the child understand that words are made 
up of morphemes, which, in turn, are made up of phonemes. Studies of spelling ability of college 
students indicate that poor spellers fail most dramatically on those words that require sensitivity 
to morphophonemic structure (Fischer, 1980; Hanson, Shankweiler, & Fischer, 1983). In addi- 
tion, examination of the spontaneous writing samples of learning disabled children and adults 
documents incorrect usage of both inflectional and derivational morphemes (Anderson, 1982; 
Liberman, Rubin, Duques, & Carlisle, 1985; Moran, 1981). Based on this evidence, a strong rela- 
tionship between the ability to use morphemes correctly in spoken and written language would be 
expected since morpheme use in either case would depend on the development of morphological 
rules and access to them in the lexicon. It would also be expected that morpheme use would 
depend, at the very least, on an implicit understanding of morphophonemic structure. However, 
the explicit understanding that words are made up of morphemes, which, in turn, are made up 
of phonemes, would clearly differentiate the proficient from the disabled writers. 

Previous research studies have demonstrated that the ability to analyze the internal structure 
of words explicitly is a critical component in learning to read (Blachman, 1983; Fox &: Routh, 1980; 
Liberman, Shankweiler, Fischer, & Carter, 1974; Lundberg, Olofsson, Wall, 1980; Treinian fc 
Baron, 1981) and in learning to spell (Liberman et al., 1985; Perin, 1983; Zifcak, 1981). In the 
reading studies, the ability to analyze spoken words into syllabic and phonemic segments has been 
found to be highly related to letter naming and word recognition performance in kindergarten, 
first- and second-grade children. In the .spelling studies, phonemic segmentation ability has been 
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found to be significantly related to dictated spelling performance in kindergarteners (Liberman 
et al., 1985), first graders (Zifcak, 1981), and adolescents (Perin, 1983). 

Research into the structural analysis of spoken words and its relationship to reading and 
spelling abilities has yielded valuable diagnostic and instructional information thus far. It is 
clear that children with reading and spelling problems are less able than their normally achieving 
peers to analyze spoken words into their constituent phonemes. However, man} questions about 
this relationship remain unanswered. To begin with, the ability to analyze sptken words into 
their constituent morphemes has been barely examined. Since the English orthography, like the 
spoken language it represents, is morphophonemic, we need to investigate the ability to analyze 
the internal structure of words as a function of both morphemic and phonemic structure. 

Recent studies have begun to examine the explicit understanding of morphophonemic struc- 
ture in children. Derwing and Baker (1977, 1979) have investigated the development of morpheme 
identification ability in children in grades 3 through college. They provided the children with word 
pairs that were varied for semantic and phonetic similarity, such as teach-teacher, slip-slipper, cup* 
cupboard , and moon-month. The children were required to read each pair and indicate if one word 
"came from the other," using a 5-point scale to specify the degree of relatedness. Performance 
correlated with age and degree of semantic and phonetic relationship between the paired words. 
The authors concluded that morpheme recognition ability may develop as much through instruc- 
tional experience as through language acquisition and suggested that it would be difficult to sort 
out the contributions of these two sources of linguistic knowledge. 

Although this research into the explicit analysis of morphemic structure is provocative, similar 
studies have not been conducted with children who demonstrate learning problems or with children 
below third grade. It would be expected that if younger children were deficient in morpheme 
use, which would reflect their implicit awareness of morphological structure, they would also 
be deficient in their ability to recognize base morphemes within two-morpheme words, or their 
explicit awareness of morphological structure. If these abilities were found to be related to each 
other and to morpheme use in early spelling, it would be possible to demonstrate the necessity of 
helping young children develop sensitivity to morphemic structure through direct instruction. 

Therefore, the present study was designed to examine the relationship between implicit aware- 
ness of morphemic structure, as measured by the ability to apply morphological rules to new 
words, and explicit awareness of morphemic structure, as measured by the ability to identify base 
words within two-morpheme words. Furthermore, the relationship between performance on the 
spoken language tasks and the ability to represent base morphemes and inflectional morphemes 
in beginning attempts at spelling was investigated. 

Although previous studies that document morphemic errors in spelling analyzed spontaneous 
writing samples, it was not considered reasonable to elicit writing samples in the present study 
since the children tested were only in kindergarten and first grade. However, i' was important 
to select child r en of this age for several reasons First of all. it was expected that they would 
demonstrate sufficient variability in their levels of implicit and explicit awareness of morphological 
structure of spcken words to enable us to learn more about the course of this development. 
Secondly, previous studies of invented spelling (Read, 1971, 1975) have demonstrated that by 
age five many children are able to airlyze words into their constituent phonemes and use their 
knowledge of letter names to "»nve*it" written representations of the spoken words. By scoring 
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for the number of morphemes represented in writing rather than for correctness of spelling, it 
seemed reasonable to use a dictated spelling task as an early indication of the ability to represent 
inflectional morphemes in written form. In this way, both spoken and written language measures 
of the morphological knowledge of young children could be obtained. Finally, this information 
could be used in future research to predict the course of morphemic development in the written 
language of children and adults. 

Method 

Subjects 

The subjects were children selected from four kindergarten classes and four firs grade classes 
in a suburban Connecticut public school. The children eligible for testing were all those for whom 
parental permission was obtained. The available 128 children (59 kindergarteners and 69 first 
graders) demonstrated adequate vision and hearing and were judged to have normal intelligence 
by their classroom teachers and the school psychologist. During a one-week period, they were 
individually given the Berry-Talbott Test of Language (Berry, 1966), a measure of elicited mor- 
pheme production in spoken language. This test required them to apply basic inflectional and 
derivational rules of morphology to nonsense base words by completing spoken sentences when 
shown illustrative pictures. 

Four groups were formed by selecting those children from each grade who scored within the 
highest and lowest thirds of the distribution of scores on the Berry-Talbott Test of Language. The 
children from ihe highest third of the kindergarten and first-grade distributions will be referred 
to as the high kindergarteners and high first graders. Similarly, the subjects from the lowest third 
of the kindergarten and fir< t grade distributions will be referred to as the low kindergarteners and 
low first graders. The mean age and test scores for each group are summarized in Table x. 



Table 1 

Berry-Talbott Test of Language: Grouped Mean Score (and Standard Deviation) 

for Kindergarteners and First Graders 



Low High Low High 

Kindergarten Kindergarten First Grade First Grade 

n 21 19 22 24 

Berry-Talbott 10.8 24.7 14.1 28.0 

(3.3) (2.5) (4.1) (3.3) 

Age (years- months) 5-5 5-5 6-5 6-5 



To determine if the children differed in their performance on the Berry-Talbott Test, an 
analysis of variance was conducted. The analysis revealed a significant main effect of group (high, 
low), F(l,82) = 347.16,M5V = 11.83, p < .OOL and grad- (kindergarten, first), F(l,82) = 
19.92, MSe = 11.83, p < .001. There was no interaction between group and grade. Further- 
more, comparison tests revealed significant differences among the groups: the high first graders 
performed better than the high kindergarteners, £(41) — 3.58, p < .001; the low first graders per- 
formed better than the low kindergarteners, f(41) = 2.86, p < .007; and the high kindergarteners 
performed better than the low first graders, f(39) = 9.49, p < .001. 



129 



Morphological Knowledge and Spelling Ability 



125 



Materials and Specific Procedures 

1) Experimental Spelling Test. This measure was designed to assess the children's represen- 
tation of base and inflectional morphemes in the early stages of their experience with written 
language. It contained 31 words that were considered to be part of the average kindergartener's 
spoken vocabulary. Twenty-one words were organized according to morphemic structure (one 
or two morphemes) and type of final consonant cluster (nasal or non-nasal). Three experimen- 
tal words were given in each of the following categories: (1) 2-morpheme words ending in md 
(hummed, jammed, dimmed), (2) 1-morpheme words ending in nd (wind, band, kind), (3) 2- 
morpheme words ending in nd (pinned, canned, lined) , (4) 1-morpheme words ending in nt (tent, 
pant, h ; nt), (5) 2-morpheme words ending in nt (bent, can't, don't), (6) 1-morpheme words ending 
in st (list, dust, nest), and (7) 2-morpheme words ending in st (kissed, fussed, messed). Ten words 
were used as fillers to reduce the possible priming effects of the experimental words. Five of the 
fillers were one- morpheme words (winter, candy, dinner, money, znd wise) and five were two- 
morpheme words (hunter, windy, winner, funny, and pies). The experimental and filler words 
were randomized and each word was dictated, then used in a meaningful sentence and repeated. 
The children were instructed to write each word on a pre-numbered response form. 

(2) Experimental Morpheme Analysis Test. This measure was designed to assess the ability 
to analyze a spoken word into its constituent morphemes by requiring each child to identify base 
morphemes within words. This task consisted of the same 31 words that were used for spelling. 
The child was asked questions such as "Is there a smaller word in dust that means something 
like dust!" or "Is there a smaller word in kissed that means something like kissed?" for each 
of the words. For one-morpheme words (such as dust, pant, and wind), the child was supposed 
to respond "No." For two-morpheme words (such as fussed, can't, and pinned), the child was 
supposed to respond "Yes" and supply the base word. 

These procedures were demonstrated in six training trials in the following manner. First, 
the child listened to each question and responded spontaneously. If the response was incorrect, 
the examiner repeated the question, provided the correct response along with a brief explanation, 
and asked the question again. This procedure was repeated once if needed. Words that contained 
smaller words that were not related to the stimulus word (such as pillow and sink) were included 
in the training trials and required "no" responses. On the test trials, no demonstrations or 
feedback were given. 

General Procedures 

The 86 children in the four groups were tested further to determine the relationship of 
their morpheme use in six ken language to their morpheme use in spelling and to their explicit 
morpheme analysis ability. During the one-week period following administration of the Bcrry- 
Talboit Test of Language (1966), each of the four groups of children was given the dictated 
experimental spelling test in a half-hour group session. During the following three week period, 
each child was given the experimental morpheme analysis task and a letter naming task in an 
individual testing session of approximately 20 minutes. To insure consistent presentation of t lie 
stimuli, all of the test items were presented on tape. 
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Results 

Implicit Morphological Knowledge and Spelling Ability 

Letter naming scores were tabulated and showed that aU but the low kindergarten children 
could name over 90% of the letters of the alphabet, a skill ntcdcd for invented spellings. 

For each child, the percentage of written words with final consonant omissions was also 
tabulated. The high first graders omitted final consonants from 3% of the words, the high kinder- 
garteners from 10% of the words, and the low first graders from 17% of the words. (Since low 
kindergarteners were not able to name the letters of the alphabet, their spelling results will not 
be discussed.) To determine if the groups differed in their tendency to omit final consonants, 
an analysis of variance was conducted with two between-groups factors (implicit morphological 
knowledge in spoken language, grade level). The analysis revealed e significant main effect of 
implicit morphological knowledge, F(l,82) = 4.25, MSe = 5.97,p < .043, and a significant inter- 
action between morphological knowledge and grade level, F(2,82) = 12.63, Af 5c = 5.97,p < .001. 

These results suggest that the ability to represent final consonants in written language is 
significantly related to morphological knowledge in spoken language and is not significantly related 
to grade level independent of linguistic ability. In other words, the krv first graders omitted more 
final consonants than did either the high first graders or the high kindergarteners. 

When the data are examined as a function of both morphemic and phonemic structure, they 
indicate that in omitting final consonants in their spelling, children tend not to be influenced 
by the phonemic structure of the words. It was found that the percentage of error on words 
ending in nasal and non-nasal consonant ciusters was roughly the same-8% and 7%, respectively. 
Ia contrast, there was a strikmg effect of morphemic structure. Whereas children omitted final 
consonants from only 4% of one-morpheme words, they omitted final consonants from 11% of 
two-morpheme words, a difference that was highly significant t(85) = 5.84,p < .001. It is clear 
from these results that final consonants were omitted more often from two-morpheme than from 
one morpheme words, and that it was the morphologically less knowledgeable first graders who 
were omitting those inflectional morphemes. 

Implicit and Explicit Levels of Morphological Knowledge 

In the morpheme analysis task, a two-morpheme word (such as pinned) was scored as correct 
if the child (1) responded "Yes" and supplied the correct base form of the word (pin), and 
(2) responded "No" to a phonemirally similar one-morpheme word (wind). (The md words 
[hummed, jammed, dimmed) were excluded from this scoring system because there are no one- 
morpheme words in English that end in md.) The two-pronged scoring system was necessary to 
counter possible effects of response bias. Without such a system, indiscriminate "no" responses 
would result in higher scores than indiscriminate "yes" responses, since "yes" responses had to be 
accompanied by the correct base word and "no" responses had no such control. By pairing words 
with similar phonemic structure end const rasting morphemic structure, one could be certain that 
"correct" responses validly represented sensitivit; to morphemic structure and not inflation due 
to u spoi.se bias. 

Using this scoring system, the percentage of correctly analyzed word pairs was tabulated for 
each child. Both high first graders and high kindergarteners analyzed 48% of the pairs correctly, 
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low first graders 24%, and low kindergarteners 3%. The correlation between the number of 
pairs a child analyzed correctly and morpheme use in spoken language proved to be significant, 
r(84) = .63, p < .001, indicating a strong relationship between implicit and explicit levels of 
morphological awareness. 

To determine if the groups o" children differed in their ability to identify base morphemes 
in pairs of words that differed in morphemic complexity, an analysis of variance was conducted 
with two between-groups factors (implicit morphological knowledge in spoken language, grade 
level). The analysis revealed a significant main effect of implicit morphological knowlr Ige, 
F(l,82) = 49.11, MSe = .05,j> < .001, and grade, F(l,82) ^ 5.80,M5e = .05, p < .019. 
Moreover, the interaction between morphological knowledge and grade level was significant, 
F(2,82) = 4.31, MSe = .05, p < .042. In other words, the high kindergarteners and high first 
graders performed equally well. 

These results show that implicit morphological knowledge in spoken language (as assessed 
by the Berry-Talbott Test) is a more important discriminator of explicit morphological knowledge 
than is grade level. Implicit morphological awareness in spoken language accounted for 34% of 
the total variance in explicit morphological awareness, whereas grade level accounted for only 4%, 
and the interaction between group and grade for 3%. 

What is particularly notable about these results is that children with high levels of im- 
plicit morphological knowledge in the elicited spoken language task performed equally well on 
the explicit analysis task regardless of grade level differences. Therefore, the ability to analyze 
morphemic structure explicitly, at least as measured by this task and at this point in development, 
seems to be more highly related to implicit morphological knowledge in spoken language than to 
grade level factors such as age and amount of instructional experience. 

Discussion 

The purpose of this study was to investigate the development of morphological knowledge 
and its relationship to early spelling ability in kindergarten and first-gradechildren. Two levels of 
morphological knowledge were examined, since previous research has suggested that children need 
to understand morphophonemic structure implicitly and explicitly in order to spell. Although 
previous studies had shown that written language proficiency requires an explicit understanding 
of morphophonemic structure, the ability of young children to analyze the internal structure of 
words had been examined at the phonemic but not at the morphenuc level of language 

It was found, in accordance with previous studies of normal language acquisition, that chil- 
dren in kindergarten and first grade are still developing implicit morphological knowledge (as 
measured by the Berry- Talbott), and that they use certain morphological rules before others. 
Notably, in view of the l?rge number of past tense items in the stismi'i that were used to as- 
sess spelling and explicit analysis abilities, most of the kindergarteners and first graders in this 
study successfully applied the morphological rules for regulai past tense (in the nonsense words. 
trommcd, flitchca, linged, and bazinged). 

In addition, it was found that implicit morphological knowledge does not develop solely as 
a function of factors associated with grade level. This was seen by the fact that some kinder- 
garteners (the high group) performed significantly better than some first graders (the low group). 
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However, the role of factors associated with grade level cannot be disregarded either, since high 
first graders performed significantly better than high kindergarteners, and low first graders per- 
formed significantly better than low kindergarteners- What is clear from these results is that 
kindergarteners and first graders vary greatly in their implicit knowledge of the morphology and 
that this variability affects their early spelling ability. 

In fact, implicit morphological knowledge had a more significant effect than grade level on 
the tendency of young children to omit inflectional morphemes in spelling. This was seen by the 
fact that low first graders made relatively more of these errors than either high first graders or 
high kindergarteners. Furthermore, the poorly developed implicit morphological knowledge of the 
low first graders correlated highly with their poor performance on the morphemic analysis task. 

Considering previous research on phonemic analysis, it was enlightening to examine the types 
of errors made by the low kindergarteners and low first graders when they attempted to analyze 
the morphemic structure of spoken words explicitly. It was found that many of these children 
could manipulate phonemic segments without understanding morphemic structure. For example, 
in response to the questions "Is there a smaller word in kind that means something like kind?" 
and "Is there a smaller word in dust that means something like dustV\ they often responded 
"Yes, Jfctn" or "Yes, tind or "Yes, dus" or Yes, tust" This finding highlights the importance 
of examining the ability to explicitly analyze the morphemic structure as wll as the phonemic 
structure of words. 

Looking more closely at the results of the explicit morphemic analysis task, th " 
the high kindergarteners and high first graders performed with identical proficiency, despite their 
different amounts of instructional experience, raises an interesting question. Since the high first 
graders demonstrated a significantly higher level of implicit morphological knowledge than the 
high kindergarteners, it seems curious at first that these two groups demonstrated identical levels 
of explicit morphological knowledge. Apparently, the high first graders would have had to show 
a greater superiority in implicit morphological awareness over the high kindergarteners in order 
to demonstrate a more sophisticated level of explicit awareness. In addition, the explicit analysis 
task may not have been sensitive enough to detect differences between the two high groups. 
What seems clear is that the ability to analyze the morphophonemic structure of a word is to 
some degree independent of instructional experience at this age level, since high kindergarteners 
performed significantly better than low first graders. Since it is difficult to sort out the roles of 
linguistic ability and instructional experience at higher age levels, it is particularly helpful to begin 
to sort out these contributions for young children. By doing so, we can begin to develop more 
sensitive diagnostic measures to predict later language learning deficits and to design instructional 
procedures that will address the inorphophoneniic aspects of learning to read and spell. 

The present study demonstrates that children in both kindergarten and first grade vary 
considerably in their implicit and e plicit knowledge of the morphology and that this variabil 
ity affects their early attempts to represent base and inflectional morpheme? in writing. It is 
clear from the obtained results that children who denionstiate weak implicit knowledge of inor 
phological rules are also deficient in their ability to explicitly anal>ze the internal morphemic 
structure of words and to use inflection^ morphemes in writing. Therefore, the greater tendency 
of the low first graders to omit inflectional morphemes in writing seems to reflect a deficiency in 
morphological knowledge, rather than just a general spelling problem. 
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It is notable that, even though most of the children demonstrate their implicit knowledge of 
the past tense rule on the Berry-Talbott Test, only the children in the high groups show some 
degree of proficiency when explicitly analyzing the internal morphemic structure of past te:ise 
words. In contrast, the children in the low groups are relatively unable to analyze the internal 
morphemic structure of the past tense words, and omit relatively more past tense inflectional 
morphemes in writing. Yet they too were able to use the morphological rule for past tense on the 
Berry-Talbott Test. At least for the low first graders, this pattern of performance suggests that it 
is their lack of explicit awareness of morphemic structure that should cause us the most concern. 
Although these children demonstrate some ability to manipulate phonemic structure, based on the 
errors they made on the morpheme analysis task, they do not seem to understand that inflected 
words are composed of groups of phonemes that form morphemes. Therefore, it seems probable 
that their lack of explicit understanding of niorphophonemic structure, in conjunction with their 
generally weak implicit knowledge of the morphology, account in large measure for the morphemic 
errors they make in their early spelling attempts. 

It seems clear, then, that even at the primary level, if children are to be good spellers, it is not 
enough for them to understand that words are made up of phonemic segments. Research into the 
spelling and written expression performance of older children and adults with learning problems 
demonstrates that errors on inflected and derived forms of words arc a major characteristic of 
their written products. The results of this study suggest that the basis for such errors may be an 
underlying deficiency at the implicit level, and especially at the explicit level, of morphological 
knowledge. Therefore, it is of critical importance that we assess the morphological knowledge 
of young children so that we may identify those who are at risk for learning problems and help 
them to develop the sensitivity to niorphophonemic structure that they need to become proficient 
written language users. 

In order to best help these children, L seems necessai^ to teach them to use grammatical 
morphemes correctly in their spoken language if ♦ hey are to become competent in spelling inflected 
and derived forms of words. In addition, the present results suggest that it is critical to teach 
these children to become explicitly aware of the structure of their spoken language productions. 
It is this explicit awareness of their language that should nelp children to apprehend the internal 
structure of the new words that they are required to read and spell. Written language instruction 
should focus on the development of structural analysis skills at both the morphemic and phonemic 
levels. It is clear that children should be taught that words (whether they are spoken, read, or 
spelled) are composed of morphemes, which, in turn, are composed of phonemes. 

In conclusion, this study represents a first step in the examination of morphological knowledge 
in the spoken language of young children as it lrlates to then ability to lepre.sent morphemes in 
writing and their ability to analyze the internal morphemic stiuctnre of words. Since this is a 
new area of investigation, it is anticipated that these results will stimulate th? development of 
other research studies. In the future, we need to conduct similar studies with learning disabled 
hildren, adolescents, and ?.dults in an eflfoit to account for the morphemic ei ror.s they make in 
reading and written expression. In this \\a\. \\<* tan begin to document deficiencies in sensitivity 
to niorphophonemic structure in these groups. It is hoped that studies of tins type will result in 
improved diagnostic and instructional procedures for children and adults with language-learning 
disabilities. 
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THE CROSSWORD PUZZLE PARADIGM: THE EFFECTIVENESS 
OF DIFFERENT WOPD FRAGMENTS AS CUES FOR THE 
RETRIEVAL OF WORDS* 



Naomi Goldblumf and Ram FrostJ 

Abstract. We investigated the internal structure of words in the men- 
tal lexicon by using a crossword puzzle paradigm. In two experiments, 
subjects were presented with word fragments along with a semantic 
cue, and were asked to retrieve the whole word that contained the pre- 
sented fragment and was compatible with the semantic information. 
In Experiment 1, we found that any cluster of adjacent three letters 
facilitated retrieval better than dispersed letters. Moreover, syllabic 
clusters had greater facilitative effect than nonsyllab 'c pronounceable 
clusters, or nonpronounceable clusters. In Experiment 2, we found 
that syllab'" units facilitated retrieval more than morphemic units. 
The results are interpreted as evidence for the existence of lexical sub- 
units that are larger than the letter but smaller than the word, and 
that are organized according to phonologic principles. An interactive 
model for solving crossword puzzles is proposed. 

INTRODUCTION 

This paper is concerned with the following question: Does the mental lexicon contain units 
smaller than the whole word but larger than the individual letter, and if so, what kind of units 
are they? The previrus answers to these questions seem to be modality-specific. There is wide 
agreement that syllabic units play an important role in auditory word perception (e.g., Kahn, 
1976; Mehler, Dommergues, Frauenfelder, & Segui, 1981; Segui, 1984). In research on visual word 
perception, on the other hand, there is conflicting evidence as to what the subword units might be, 
and whether or not the visually presented stimuli undergo phonologic as well as visual processing. 

* Memory & Cognition , in press. 
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Spoehr and Smith (1975) have shown that a vocalic center group (VCG) is more easily percieved 
than a similar cluster of letters not containing a vowel. Their use of the VCG is based on the 
work of Hansen and Rodgers (1965), who define a VCG as a cluster consisting of a vowel with a 
consonant or consonants on either side, where the whole cluster forms a pronounceable unit. AN, 
CAN, ANT, and CANT are examples of VCGs. They have also shown that one-syllable words 
are processed faster and more accurately than two-syllable words of the same number of letters 
(Spoehr & Smith, 1973; see also Spoehr, 1978). From these results Spoehr and her colleagues 
concluded that words are represented in the lexicon according to their syllabic structure. 

In contrast to the phonological division suggested by Spoehr and her associates, Murreii 
and Morton (1974) and Taft and Forster (1975) proposed a morphological division into units. 
According to this view, polymorphemic words are stored in the lexicon in a morphologically 
decomposed fashion: the root and the information about prefixes and inflections. Thus, in the 
process of word recognition, the reader strips the prefixes, and accesses the morphological root 
first. A different division of written words into units was suggested by Taft (1979). Taft defined 
the minimal lexical unit as a Basic Orthographic Syllabic Structure (BOSS). The BOSS is formed 
by adding as many consonants as possible to the first vowel in the first syllable, without violating 
the ortho'actic rules of English. Thus, the BOSS is a unit consisting of as many consonants as 
can legally be found together with one vowel, at the beginning or the end of a word. According 
to this view, in order to access a multimorphemic word in the mental lexicon, one first accesses 
its BOSS unit. In a series of experiments designed to investigate Taft's hypothesis, Lima and 
Pollatsek (1983) found no difference between the facilitative effect of syllables and BOSS units. 
They demonstrated, however, that either of these units was better than an arbitrary unit in 
priming a word of which they were a constituent. When a syllabic unit was also a morphemic 
unit of the word, then it was more facilitative than a syllabic unit that did not constitute a 
morpheme of the word. 

This inconsistency of results is puzzling but may perhaps be attributed to task characteristics. 
All of the above studies concern visual word perception, and most of them use the lexical decision 
paradigm. Usually, in the experiments described above, words that are parsed into units according 
to phonologic or orthographic principles are presented visually to the subject. Here, the speed 
and accuracy of lexical decisions to such parsed words is assumed to reflect the naturalness of 
these units. It is assumed that if lexical search is facilitated by a particular division of a word, 
then this division actually reflects important characteristics of the representation of this word in 
the internal lexicon. However, it has been recently suggested that lexical decisions, in many cases, 
do not involve more than superficial lexical access (Balota & Chumbley, 1984). Since all that is 
needed for lexical decision is a judgment concerning the probability that the letter string is a 
valid word, it is possible th*t, for at least some words, the decision is based on a fast judgment 
concerning the familiarity of the letter string. In such case, the decision stage occurs prior to any 
deep analysis of meaning and morphemic structure. This suggestion is described in a two-stage 
model of lexical decision performance (Balota & Chumbley, 1984). According to this model, very 
familiar and very unfamiliar letter strings are processed superficially without lexical access. The 
letter string will undergo deeper processing that involves decomposition into units only when a 
fast decision concerning its familiarity cannot be reached. Consequently, in a lexical decision task 
where a whole word is presented to the subject, a division of the word into subunits may, in many 
cases, be irrelevant to the task. If this is the case, then the structure of the internal lexicon may 
not be accurately reflected by performance in lexical decision experiments. 
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A retrieval task, on the other hand, can avoid the artifacts of the lexical decision process. 
If only a fragment of a word is presented, and the subject is asked to retrieve the whole word 
containing this fragment, the extent to which a particular fragment facilitates retrieval may reflect 
the functional role of this fragment in the lexicon. 

An example of such cue-facilitated retrieval is the process that occurs in the solving of 
crossword puzzles. When part of the word is filled in, the solver has two cues for the retrieved of 
the word: the filled-in letters in their appropriate places and the "definition," which is generally 
a synonym or some other associative term. When the solver cannot come up with the correct 
answer, he or she tries to fill in more letters by finding adjacent words. The solver usually chooses 
the position to be filled next, according to his or her intuition about the relative facilitatory effect 
of the positions that are still empty. This raises the followinq questions: What facilitates retrieval 
be <er, dispersed letters or letter clusters? Also, is there any difference among types of clusters? 
The study of the relative facilitatory effect of different types of word fragments may provide us, 
then, with useful clues about the structural representation of words in the mental lexicon. If 
words in the internal lexicon are actually organized in terms of subunits, it is more likely that 
people will make use of these subunits when they are presented with them and asked to retrieve 
the whole word, rather than simply make lexical decisions. 

A number of experiment*? using the word-fragment paradigm indicate that with the number 
of letters controlled, all fragments are not equally effective for the retrieval of words. Horowitz, 
White, and Atwood (1968) presented subjects with lists of nine-letter words to memorize, and then 
tested whether the first, middle, or last three-letter fragment facilitated recall most. They found 
that the first fragment wa§ most facilitative, followed by the last and middle fragment in that order 
of facilitation. However, Horowitz and his colleagues did not control the pronounceability of the 
fragments or whether they corresponded to syllables. This factor might have had some influence 
on the results. Since the middle fragment of a nine-letter word is less likely to be pronounceable 
than either of the end fragments, the position of the fragirent may have been confounded with 
its pronounceability. Using a similar procedure, Dolinsky (1973) repeated this experiment with 
a control for the presence of syllables. After presenting his subjects with a list of word: , recall 
was cued by presentation of syllabic and nonsyllabic fragments, at the beginning, middle, or final 
fragments of the word. Dolinsky found that the presence of a syllable had a significant facilitative 
effect on retrieval only in the middle fragments. When the cues were the begining or the final 
fragments, syllabic clusters did not facilitate recall better than nonsyllabic clusters. However, 
Dolinsky did not control for the pronounceability of the nonsyllable fragments and some of his 
nonsyllable controls were actually three letters of a four-letter syllable. 

In the present study the word-fragment technique was used to investigate what sublexical 
word units, if any, exist in the internal lexicon when the letter's position vithiu the word is 
controlled. It is possible (1) that individual letters in a word act separately and in parallel to 
activate directly the word of which they are constituents, or (2) that any group of consecutive 
letters in a word constitute a unit, or (3) that only very specific groups of consecutive letters have 
an activating effect greater than that of individual dispersed letters. If there are no middle-sized 
units in the lexicon, then all fragments of the same length should be equally helpful in retrieving 
a word. If letters grouped together are more effective in activating a word, then any group of 
consecutive letters should be a better retrieval cue than the same number of dispersed letters. If, 
however, there are specific groupings of letters that constitute units in the internal lexicon (e.g., 
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syllables), then these specific groupings should be more effective cues for word retrieval than any 
other groupings of the same length. 

EXPERIMENT 1 

Experiment 1 was designed to investigate whether letter clusters facilitate retrieval more 
than dispersed letters, and whether syllabic units are more facilitative than any other cluster of 
letters independently of their position in the word. To this end, syllabic units were compared 
with three types of fragments: pronounceable nonsyllabic clusters, unpronounceable clusters, and 
nonadjacent letters. To avoid the effect of length of cluster, all word fragments were composed of 
different combinations of three letters. For example, the target word "VINDICTIVE" was cued 
by the synonym "spiteful," together with one of the following four fragments: 

1. DIC (syllable) 

2. ICT (pronounceable nonsyllable) 

3. - _ NDI (unpronounceable cluster) 

4. N-I-T (nonadjacent letters) 

If there are no units larger than the individual letter in the internal lexicon, then any three 
letters of a word should be just as good a retrieval cue as any other three letters situated in similar 
positions within the word. If it is the clustering of the letters in itself that facilitates retrieval, 
then any cluster should be better than dispersed letters, without any difference between clusters 
of different types. If it is merely the pronounceability of the cluster that facilitates retrieval, 
then pronounceable clusters should be as facilitative as true syllables. If, however, syllables do 
constitute functional units in the internal lexicon, then a syllable should be more facilitative for 
the retrieval of the target word than any of the other fragments. 

Methods 

Subjects. Sixty-four undergraduate students at the Hebrew University of Jerusalem par- 
ticipated in the experiment for course credit or for payment. All subjects were native English 
speakers. 

Stimuli and design. The stimuli were 48 English words: 22 nouns, 8 verbs, and 18 adjec- 
tives. All the words had three syllables and v/ere from seven to ten letters long. Their frequency, 
according to Ku$era and Francis (1967), ranged from 0 to 45, with a median of 10.5. There was no 
significant difference between the frequencies of the fragments of each type of cluster, according 
to the trigram frequency list presented by Underwood and Schulz (1960). 

Four different types of fragments for each word were presented: A syllable, a pronounceable 
cluster that was not a syllable of this word, an unpronounceable clustei, 1 and three nonadjacent 
letters. Syllables were defined according to Webster's New World Dictionary of the American 
Language (1964). In those cases where the dictionary proposed two divisions, phonologic and 
orthographic, the phonologic division was used. All fragment types consisted of three letters; 
dashes were presented in place of all the missing letters. To eliminate the possibility that the 
number of vowels or consonants in the fragment might have some effect on retrieval, only fragments 
consisting of two consonants and one vowel were used. In order to ensure that the effect of the 

1 By unpronounceable clusters we imply clusters that are phonotactically irregular in English. 
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type of fragment was not confounded with the effect of the fragment's position, all the possible 
positions within the word were sampled. For the syllabic fragments, the first, the middle, and 
the last syllables were presented equally. In the isolated letters condition, half of the trials 
included either the first or the last letters of the word, and the remaining trials did not. The 
unpronounceable fragments were always in the middle of the word, as there are no words in which 
the first and the last fragments are unpronounceable, given the constraint that the fragment must 
contain a vowel. A semantic cue for the word, that is, a word or a phrase with approximately the 
same meaning as th^ target word, was presented in lowercase letters just above the letters-dashes 
configuration. 

Each word was presented with all four types of fragments, so as to serve as its own control. 
The subjects were divided into four groups. Each group presented with only one of the four 
fragments of each word, in one of the possible fragment positions. Each group was presented with 
an equal number of words in each of the four fragment types. The different words in the different 
conditions were assigned to the four groups of subjects by means of a Latin square design, so that 
no subject saw a word more than once. The list of target words and fragments is presented in 
the Appendix. 

Procedure and apparatus. The subjects were seated approximately 70 cm from a CRT 
screen in a semi-darkened room. Each stimulus appeared on the screen after the subject pressed a 
"start" button. The experimenter pressed a "finish" button when the correct answer was given by 
the subject, and only then was the stimulus removed from the screen. This procedure was deemed 
necessary because subjects often made incorrect spontaneous vocal responses. Consequently, a 
voice key for determining the exact reaction time could not have been used. However, in order to 
avoid an experimenter bias, the experimenter did not face the screen, and was not aware of the 
specific fragment condition in each trial. Rather the experimenter was presented with a parallel 
list that contained all the correct responses, and pressed the "finish" button accordingly. If the 
subject gave an incorrect answer, he or she was told that it was incorrect and was allowed to 
guess again. If, however, the subject did not give the correct response in 30 seconds, the stimulus 
disappeared, reaction time (RT) was recorded as 30 seconds, and the trial was considered as a 
"no response" trial. Stimuli presentations and RT measurements were controlled by a PDP 11/23 
computer. The subjects were presented with three practice trials before the test stimuli were 
presented. 

Results and Discussion 

Reaction times in seconds and "no response" rates were calculated and averaged for the four 
experimental groups across the four fragment conditions. They are presented in Table 1. 

The mean reaction times of each type of fragment were calculated across the different posi- 
tions within the word. A one-way ANOVA revealed that the differences between the mean RTs 
to the different fragment types were significant, Fl(3,189) = 83.5, p < 0.001 and F2(3,141) = 
29.5,p < 0.001, MinF 1 = 21.8,p < 0.05. 
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Table 1 

Mean Reaction Time in Seconds, Percent of "No Response," 
and (SDs), in the Four Fragment Conditions. 



Syllable 
condition 

Reaction time 11.6 

( 4.2) 

Percent of 24.7 
no response (15.2) 



Pronounceable 
nonsyllable 

16.4 
( 4.7) 

40.1 
(18.4) 



Unpronoun. Nonadjacent 
cluster letters 

19.0 20.9 

( 4.1) ( 3.6) 

50.9 54.9 

(15.5) (16.0) 



Planned comparisons were performed only between those groups o f words in each condition 
for which the fragment clusters were at comparable positions within the words. Thus, the results 
were based strictly on the effect of the fragment type without being confounded with position 
effects. The results of the planned comparisons are presented in Table 2. 



Table 2 

Planned Comparisons of Reaction Times in Seconds, 
between Pairs of Fragment Conditions, with Subject (SR) 
and Word (WR) Random. 



Conditions 


Mean percent 


Mean RT 




t value 


compared 


of no response 


(SD) 






Syllable 


22.4 


11.2 


SR 


<(63) = 5.16p < 0.001 


vs. 


(16.4) 


( 4.5) 






Pron. Nonsyl.- 


37.5 


15.5 


WR 


r(35) = 2.53p < 0.02 




(20.7) 


( 5.2) 






Pron. Nonsyl.- 


48.4 


18.6 


SR 


<(63) = 0.89p < n.s. 


vs. 


(21.0) 


( 5.2) 






Unpron. Clus.- 


44.8 


17.9 


WR 


f(35) = 0.30? < n.s. 




(17 3) 


( 4.7] 






Pron. Nonsyl.- 


37.1 


15.5 


SR 


f(63) = 8.08p < 0.001 


vs. 


(20.7) 


( 5.2) 






Nonadj. Lett.- 


55.6 


21.0 


WR 


f(35) = 3.80p < 0.001 




(17.7) 


( 4.1) 






Unpron. Clus.- 


55.5 


20.6 


SR 


<(63) = 2.72p < 0.008 


vs. 


(22.2) 


( 5.2) 






Nonadj. Let.t.- 


64.8 


23.0 


WR 


<(23) = 2.38p < 0.026 




(22.5) 


( 5.0) - 
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The results clearly demonstrate that the syllabic fragments are better retrieval c«es than 
any other fragment in a given position in the word. It is of interest to note that there was no 
significant difference between the two kinds of nonsyllabic clusters: the pronounceable nonsyllable 
and the unpronounceable cluster. However, it is clear that clustering in itself facilitates retrieval, 
as any cluster yielded better performance than the nonadjacent letters. 

While considering the facilitation of syllabic fragments versus pronounceable nonsyllabic 
fragments, one cannot disregard the fact that for many words the division into syllables is con- 
troversial. Although in English some words have clear syllabic boundaries( e.g., "after"), for 
many words the syllabic boundaries are not well defined (e.g., "dagger"). These words contain 
ambisyllabic segments in most cases, in which a clear and unequivocal break dots not exist. Am- 
bisyllabicity is the major cause for having more than one theory of syllabification in English, 
because different parsings into syllables can be suggested for many words (see Kahn, 1976). 

The issue of ambiguous syllabification is not only a linguistic issue, but also a psychological 
and methodological one. It might be the case that some of the controversy that revolves around 
the effect of syllables in word perception is due to the use of stimuli whose syllabification is 
ambiguous. One may suggest that the use of such stimuli might have prevented the researchers 
from finding a clear facilitation for syllabic units. In the present study, however, we found a 
strong facilitation of syllabic clusters even though a great number of the experimental stimuli 
contained ambisyllabic segments. We believe that even greater facilitation can be demonstrated 
while using only words that have unequivocal syllabic boundaries. 

Unambiguous syllabifications can be easily differentiated from ambiguous ones. Although 
linguists disagree about the correct syllabic boundaries of many words, there is a set of syllabifi- 
cation rules that they do agree upon. For exemple, it is fairly accepted that a syllable must begin 
and end v>ith consonants or sequences of consonants that are legal in word-initial and word-final 
position, or that adjacent vowels belong to differnt syllables, or that the stressed syllable will 
contain the maximal pennissable number of consonants. 

Given the great theoretical relevence of syllabification ambiguity, we examined the results 
separately for those words whose syllabification is unambiguous. The differences between the syl- 
labic and the nonsyllabic pronounceable clusters only increased: RT=9.6 (SD=5.5), and RT=14.9 
(Sd=9.2); for syllabic and nonsyllabic fragments, respectively. The results of percentage of "no 
response" were similar* 16.2% for syllables, and 36% for nonsyllabic fragments. 

EXPERIMENT 2 

The results of Experiment 1 showed that syllables facilitate retrieval of words from semantic 
memory, h >wever, it is not clear whether the facilitation that was found for syllabic units should 
be attributed to phonology or to morphology. Experiment 2 was designed to address this issue 
by investigating the relative facilitative effect of pnonologic units versus morphemic units. 

Chomsky and Halle (1968) suggested that morphem > rather than phonologic units are stored 
in lexical memory in English. This suggestion is based on the claim that the syllabic structure of 
a word changes in a systematic way when affixes are added to it, while the underlying morphemic 
str rture remains the same. Thus, it is more parsimonious to store the morphemic structure 
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together with the rules for generating the phonologic structure according to the affixes added to 
the basic word. 

Another source of evidence supporting the existence of morphemic units derives from reading 
research. Marcel (1980) suggested that in the process of reading, the reader parses the letter 
string not only by a cumulative and exhaustive procedure, but also according to morphemic 
specifications that are in the visual lexicon. Kay and Marcel (1981) presented subjects with 
nonwords containing legal morphemes and demonstrated that naming latencies depended on their 
pronounciation regularity. Kay and Marcel therefore suggested that morphemic units are probably 
the basis of generating phonology in beginning readers. 

A different technique for investigating lexical units is suggested by Prizmental, Treiman, 
and Rho (1986). They presented subjects with a target letter followed briefly by a string of 
colored letters. Prizmental et al. demonstrated that subjects sometimes report seeing letters 
and colors in incorrect combinations (illusory conjunction). Hence, they investigated in what 
type of letter combinations these illusory conjunctions are more likely to occur. Their results 
suggested that syllables defined by purely phonological principles did not affect feature integration. 
Contrarily, syllables that were defined by morphological boundaries were functional units in the 
visual analysis. 

However, morphemic and syllabic u*-its tend to overlap to a great extent. In most English 
words the morphemic units are cither identical with the syllabic units or else have one more letter 
at the end. This overlapping ol units may be one of the reasons for the difficulty in obtaining 
clear-cut results concerning their effects. Therefore, to test this, in Experiment 2 we employed 
stimuli that contain morphemic and syllabic units that do not overlap. 

Methods 

Subjects. Forty-eight undergraduate students from the Hebrew Universit}', all native English 
speakers, participated in the experiment for course credit or for payment. 

Stimuli and design. The stimuli were 24 English words: 7 rouns, ^ verbs, and 13 adjectives. 
Twenty-one of the words had four syllables, while the remaining thre^ had five. The words *vere 
seven to twelve letters long. Their frequencies, according to Ku$e T a and Francis (1967), rp.nged 
from 0 to 43, with a median of 7. All the words were of Gi ek or Latin origin, and their 
decomposition into morphemes was defined according to Aronoff (1976). In order to avoid a 
confounding with the fragment position within the word, only the middle fragments were used 
as cues. Each word contained a middle morphemic unit and a middle syllabic unit that was not 
contained within the morphemic unit. Words of this type are words that are not pronounced 
according to their morphemic structure. For example, the morphemes of the word "monotonous" 
are "mono," "tor**" and "ous," while the stressed syllable (which was the phonetic unit used in 
every case) is "not." We could therefore compare the effects of "_ _ NOT _____" and "_ 

TON "as cues, logether with the semantic synonym: "boring; dull." Each cue was 

presented with a morphemic fragment to one group of subjects and with a syllabic fragment to 
another group of subjects. Altogether, the subjects in each group saw each word only once. They 
were prese .ted with half of the syllabic fragments and half of the morphemic fragments, - Midomly 
selected. Vhe procedure and apparatus were identical to tLose in Experiment 1. The lisi of target 
words is presented in the Appendix. 
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Results and Discussion 

The mean retrieval time and the percentage of M no answer" for words cued by morphemic 
fragments and for words cued by phonetic fragments are presented in Table 3. 

Table 3 

Mean Reaction Time in Seconds, Percentage of "No Response," 
and (SDs), for Words Cued by Morphemic and Phonetic Fragments. 



Morphemic fragment Phonetic fragment 

Reaction Time 16.3 13.3 
( 3.5) ( 4.2) 

"No re ponse" 4x1% 29.2% 

(14.0) (16.5) 



The differences in reaction t'mes were significant vith subjects as random variable, and with 
words as random variable: *(47) = 5.23, p < 0.001; ^(23) = 1.92, p < 0.065, respectively. 

Experiment 2 thus showed that, at least for those words used in tin Jtudy, syllabic units are 
more facilitative for the retrieval of words than are morphemic units. Thr*e results aparently 
conflict with findings in experiments thai employed lexical decision and naming tat ks ana yielded 
b -Uer performance for wrds that were parsed according to morphemic principles (e.g., Murrell 
& Ivlorton, 1974; Taft, 1979). This discrepe.icy in results deserves attention. 

The comparison of morphemic and syllabic units in English is methodologically problematic, 
as the results are heavily dependent on the choice of urUs in each experiment. The morphemic 
units that were used by Taft (19" 9) or Mu*rell and Morton (1974) consisted of independent lexical 
units (i.e., ordinary words of the language). Therefore, there is no question that these units are 
stored as such in the internal lexicon, and for those specific words it is reasonable to assume that 
the morphemic i nits convey more information that any other units. 

The empirical question that we addressed in this experiment refers to the comparison of 
morphemic and syllabic units that do not have an independent lexical status. However, as was 
previously pointed out, in most of these cases the syllab ; - and the morphologic segmentations 
overlap. Herce, the only set of stimuli that allows one to test the relative facilitation of phonologic 
and morphemic units is the one that does not confound syllables and morphemes. Unfortunately, 
this set of words is usually comprised of words of Greek or Latin origin, and the naive reader is 
usually unaware of ,he morphemes' meaning. The results of Experiment 2 clearly demonstrate, 
at least for these type of words, that morphemic units do not play an important role. These 
units are theoretical constructs used by linguists to explain the structures of English words. Our 
results suggest that people do not have a deep linguistic knowledge of their language. Units that 
do not have a phenomenological reality for the individual do not have a psychological reality. 

In conclusion, although our results do not rule out the possibility that some morphemes 
might be better cues, they conflict with a strong version of morphemic lexical structure that 
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claims that only morphemes are stored in *he lexicon The pattern of cue facilitation obtained 
i:i Experiment 2 suggests that phonologic units do play a role in the retrieval of words and, all 
other things being equal, *hey are better cues for word retrieval. Since phonologic units have also 
been shown to play a role in the perception of both auditorily and visually presented words (e.g., 
Mehler et al M 1981; and Spoehr & Smith, 1975, respectively), they are thus seen to be involved 
in many aspects of the internal processing of words. 

GENERAL DISCUSSION 

In the present study we investigated the nature of word units in the internal lexicon by 
using a crossword puzzle paradigm. Experiment 1 showed that any grouping of letters is more 
facilitative than dispersed letters in retrieving words from memory. This result, however, is not 
surprising. It appears that the information afforded by a given set of clustered letters is more than 
the sum of the information afforded by each of the cluster's constituents alone. This conclusion is 
in accordance with McClelland and Rumelhart's model of word recognition (1981). According to 
their model, the greater activation of three adjacent letters derives from the pattern of activation 
characteristic of any adjacent positions. The claim, however, for the existence of units in the 
lexicon does not refer only to the relative position of letters at the letter level, but also to the 
existence of 1 idependent subunits above the letter level but below the word level. 

The controversy resides in the definition of these units. The results of Experiments 1 and 2 
taken together demonstrate that phonologic units are more facilitative for the retrieval of words 
than arc any other units. It is important to note that this effect cannot be attributed to pronounce- 
ability facto* s alone. In Experiment 1, there was no significant difference between the nonsyllabic 
pronounceable and unpronounceable clusters; it- >reover, the syllabic cluster facilitated retrieval 
more • -*n either one of them. 

In Experiment 2, we directly tested the relative facilitation caused by syllabic and morphemic 
units. Although we cannot rule out the possibility that morphemic units also play some role in the 
internal processing of words, we suggest that syllabic units are more central. Thus, we propose 
that syllabic units are stored as such in the lexicon. 

A model based on this hypothesis can be constructed as an extension of the interactive 
model of the lexicon proposed by McClelland and Runielhart (1981). Using similar principles, 
we too propose a model in which words are connected by excitatory links to the letters they 
are composed of. However, we suggest that the word and letter nodes are mediated by a third 
level that is comprised of letter units. These units reside between the word level and the letter 
level <iiid are organized according to syllabic principles. According to this model, a word can be 
recognized or retrieved on the basis of th** isolated letters contained in it. However, retrieval is 
facilitated if the intermediate syllabic units are activated by a previously presented cue. This is 
because the syllabic units are more closely related to the word level than are the disperse 1 letters. 
In the crossword puzzle task, when a syllabic configuration is presented to the solver, it directly 
activates the node in the lexical network that is consistent with the presented information. This 
node, however, only rarely activates a single word node, as usually more than one word contains 
one specific syllable. If the word cannot be retrieved, then the addition of semantic information 
may eliminate some of the possible word candidates and may cause greater activation in the 
remaining ones. The complete activation of a specific word in the network (i.e., the retrieval of 
that word) is aided, therefore, by the additional semantic cue. The semantic information that is 
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given with the letter configuration activates in parallel, through top-down processes, those word 
nodes that are consistent with it. The combination of the unit's bottom-up activation and the 
semantic information's top-down activation finally enables the retrieval of the target word from 
the lexicon. By the same argument, the addition of any single letter to the letter configuration 
will also narrow the number of competing words, thus facilitating retrieval. If, however, the added 
letter completes a syllabic unit, the increase of bottom-up activation will be comprised of two 
factors: (1) the added activation of that specific letter but also, and more impr "tantly, (2) the 
additional activation of the completed syllabic unit. Thus, the completion of a full syllabic unit 
increases the probability of word retrieval. 

Note that although the stimuli in the present experiments were presented in the visual modal- 
ity* by no mea> s do we suggest that only the visual lexicon is involved in the process of word 
retrieval. As the retrieval task requires relatively long reaction times, *>r*d may not tap on-line 
processing, it is reasonable to believe that both the auditory and the visual lexicons are involved 
in the task. In many cases, the final activation of a word node (i.e., report word retrieval) can 
derive from activation of either one of the lexicons or both. Regardless of this possibility, we be- 
lieve that the differences in the relative facilitation of the visually presented letter clusters reflect 
their iclative lexical status. 

In conclusion, we suggest that the word-fragment completion task is a sensitive test for in- 
vestigating lexical structure. Results from this task suggest that subunits of words that are larger 
than the letter unit are probably stored in the mental lexicon along with the words themselves. 
These subunits and their interconnections make up the lexical word. As syllables appear to be the 
best cue for word retrieval, we suggest that syllabic units have a strong lexical reality. The exact 
formal definition of the syllabic units in many English words is the source of large disagreement 
among linguists. This question, however, might be regarded as an empirical and psychological 
question. Thus, the word-fragment completion task could provide empirical evidence that might 
influence current linguistic theories. 
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Stimuli used 



Synonym Syllabic Pronounc. 

nonsyllab. 



liquid metal 


ill £yK 


-ERC — 


uninhabited 






place 


urn 

VY1L 


T T T"\ 


unpaid worker 


VOL 


— LUN 


pierce 


PEN 


-NET 


enchant 


CAP 


TIV 


roast 


dad 
dAK 


BEC 


careless 


NEG 


LIG._. 


move around 


CIR 


— CUL- 


invent 


FAB 


___RIC___ 


agreement 


HAR 


— MON. 


true 


FAC 


-ACT 


enlarge 


MAG 


NIF. 


copy 


_.PLI__. 


DUP 


disgust 


..VUL.... 


REV 


aspect 


-.MEN. 


DIM 


protective 


—FEN 


DEF 


_ mi; _ 

unwilling 


..LUC 


REL 


unbiased 


_.PAR 


IMP. 


leavetakuig 


—PAR 


DEP 


amusement 


-VER 


DIV.. 


loathsome 


_.PUL 


REP 


resentful 


..DIG 


IND 


choosy 


_.LEC 


SEL 


lonely state 


„CLU 


SEC 


irresistible 






force 




.OMP 


continual 


C fC 

bis 


_ERS_ 


coiifustd 


\Xt t T 


._JLD 


forecast 


UIK., 


nrn 


thorough 


__I EN 


...ENS... 


crowded 






condition 


VJ _jO__ 


.UWu 


not wanted 


_. WEL 


_._ELC_— 


spiteful 


___DIV 


— ICT.._ 


friend 


...PAN 


-OMP 


repay 


__.r EIN 


ENS___ 


stamina 


nil d 
-_LJ U ft 


dam 


excellence 


r EC -— 


EC 1 ... 


liurale 


r*j it 

CLE 


... I AC ._ 


increase 


PLY 


_ULT____ 


retaliation 


.....SAL 


_._R1S__ 


upright 


CAL 


_ERT_... 


unique 


LAR 


...GUL.. 


intern ation aJ 






negotiator 


MAT 


...LOM._ 


reference book 


.... NAC 


..MAN.. 


watchman 


NEL 


...TIN.. 


buttoned 






sweater 


GAN 


...DIG.. 


cruel 


MAT 


..HUM.. 


biased 


SAN 


...T1S.. 


deviant 


MAL 


...CRM.. 



in Experiment 1 



Unpron. Nouadj. Target 

cluster letters 

._RCU_. M.R.U.. MERCURY 

..LDE W...E-N... WILDERNESS 

NTE.. V_.U.T__ VOLUNTEER 

— ETR... P.N_-A.. PENETRATE 

..PTI C_.T.JV_ CAPTIVATE 

__RBE._ B.R.E_._ BARBECUE 

_egl_ n..l..e.. negligent 

_.rcu — c.r a.. circulate 

.abr f_.r_.a__ fabricate 

_.rmo.. h.r.o.. harmony 

..ctu_. f.c.u.. factual 

.sgn... m.g_l. magnify 

.upl. _.p_i..t_ duplicate 

....lsi.. ..v.l_l_ revulsion 

nsl. __m.nj- dimension 

nsl. ,_f.n_l. defensive 

cta— ..lx'jv.. reluctant 

-mpa...._ ..p.r.i.. impartial 

_...rtu_ _.p_r,u._ departure 

..._rsl. —v-rj— diversion 

..__lsl. ..p.lj- repulsive 

_ndi. ..d.gjv__ indignant 

ct1._ __l_c_l_ selective 

-ecl. .e_l_s seclusion 

-mpu _0_p.j?___ compulsion 

_.rsl..._ ..r_lt.__ persistent 
._..lde_.. __w_l_e_._ bewildered 
_.___cti._ ...d.c.i- prediction 
nsl. ..t.n.l. intensive 

..nge _0_g_s..._ congestion 

_._.lco_. __w.l.o.. unwelcome 

..nd1 ..n_i_t._. vindictive 

_.mpa.._. .o.p.n... companion 

-mpe _0_p n_.__ compensate 

_ndu..._. _.d.jv_c_ endurance 

_ rfe ...f.c.l. perfection 

.._acl_ _b___c.e obstacle 

._.1pl. -l..p.y multiply 

epr.... ..p.l.l reprisal 

..rt1... .e.t...l vertical 

..ngu .lg...r singular 

.1pl -. - p.o..t diplomat 

.LMA... .L_A..C ALMANAC 

_.NT1... ..N.L.L SENTINEL 

_.RDL__ _A.D_..N CARDIGAN 
.NHU_„ .N.U..N INHUMAN 
..RTI... ..RJ..N PARTISAN 
RMA. ...O.M.L ABNORMAL 
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Stimuli used in Experiment 2 



Synonym 


Phonetic unit 


Morphemic unit 


target 


nnt n^rtni^nt 


R EI, 


I EV 

Lt Lj V 


TR RET EVENT 
ItvtvE/LiE/ V E/fi 1 


disrespectful 


REV 


REV... 


IRREVERENT 
i rvrv & v &i\ ljii 1 


manase skillfull v * 








control 


NIP 


PUL 


MANIPULATE 

ATA 7ft il II KJ LJ T\ A LJ 


horinff' /lull 


NOT 


TON... 


MONOTONOUS 

vt\ v li \J 1 \s li v KJ O 


m^Jtt-^Atiiiir 


__.NIV 


VOR 


T'A RNIVOROTIC 


stow or snr^ad 








rapidly 


LIF 


PER 


PROLIFER ATE 


exclusive control 








of nwn^rihin 

V/ A UWIIViffillL/ 


_>NOP__. 


POL. 


MONOPOl Y 
ivi 11 r vbi 


dtftlOVAltV' 








ii n 1 Jilt n Till t%^«« 


HEI 

LJ Lj kJ 


Fin 


INFIHET ITY 


rnnmniiAtit #1 rn/ 1 ! tir*' 

cunipunciii siiuciuic, 










M AT 


TOM 

1 KJ IVI m 


A N ATOM V 
Afi A 1 IVI I 


All-nnw^rfii 1 

IIU OU W CI 1 U I 


NIP 


POT 


OMNIPOTENT 


«r»T^nnin 

9L71V AIUAU 


NIF 


FIP 


M AONIEICENT 


ti&rhtlv ioin^H 


SEP 


PAR 


INCEPA R ART E 

1 11 O Lj in iKnSJLtU 




D AT 

K A 1 


PAD 

i A A. 


A PP A P a f TIC 


look forward to 


TIC 


CIP._. 


ANTICIPATE 

/\ 11 1 Ivli nl u 


lllUCL/CllUvUv C 


TON 
1 vy ii 


NOM 


AUTONOMY 


rinn * o^n^fmi* 
K.J1IU, ^CUCIUUs 


wpy 


VOI 

_ _ . _ V vb 


RENEVOT ENT 


ri t > ni ' iiiisKI* 








to decide 


ore 

A. E/O 


COT 


TR R ECOT TTTE 


vague; not exact 


__DEF 


„_.FIN__. 


INDEFINITE 


mix uniformly 


..MOG 


GEN 


HOMOGENIZE 


vigorous; full 








of pep 


._._GET__ 


_.ERG 


ENERGETIC 


conflicting feelings 


__BIV 


__„VAL_._. 


AMBIVALENCE 


secret; not to be 








disclosed 


DEN.___ 


_..FID 


CONFIDENTIAL 


applied science 


_.„NOL__ 


LOG_ 


TECHNOLOGY 


unlawful 


GIT 


..LEG 


ILLEGITIMATE 
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ON THE POSSIBLE ROLE OF AUDITORY SHORT-TERM ADAPTA- 
TION IN PERCEPTION OF THE PRE VOCALIC [m]-[n] CONTRAST* 



Bruno H. Repp 



Abstract. Acoustic information about the place of articulation of a 
prevocclic nasal consonant is distributed over two distinct signal por- 
tions, the nasal murmur and the onset of the following vowel. The 
spectral properties of these signal portions are perceptually important, 
as is their relationship (the pattern of spectral change)., A series of ex- 
periments was conducted to investigate to what extent relational place 
of articulation information derives from a peripheral auditory interac- 
tion, viz., short-term adaptation caused by the murmur. Experimen- 
tal manipulations intended to disrupt the effects of such adaptation 
included separation of the murmur and the vowel by intervals of si- 
lence, presentation to different ears, and reversal of order. Other 
tests of the possible role of adaptation included manipulation of mur- 
mur duration, murmur-vowel cross- splicing, and high-pass filtering of 
the excised vowel onset. While the results of several experiments were 
compatible with the peripheral adaptation hypothesis, others did not 
support it. An alternative hypothesis, that the manner cues provided 
by the murmur are crucial for accurate place judgments, was also 
discredited. It was concluded that, at least under good listening con- 
ditions, the perception of spectral relationships does not depend on 
peripheral auditory enhancement and probably rests on a central com- 
parison process. 

INTRODUCTION 

The present study continues recent research on the perceptual integration of nasal murmur 
and vowel onset cues to the [m]-[n] distinction in CV syllables (Kurowski & Blunistein, 1984; Repp, 
1986). Kurowski and Blumstein showed that each of these signal portions may carry considerable 

* Journal of the Acoustical Society of America, in press. 
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place of articulation information, and that subjects' identification performance is better when 
both are present (as they normally are) than when only one is. They suggested that the two cues 
may function as a "single auditory property." However, their data also seemed consistent with 
the alternative possibility that the two cues are processed separately and combined at a later, 
evaluative stage in perception (see, e.g., Massaro & Oden, 1980). Repp referred to these two 
hypotheses as single-cue (or early integration) and multiple-cue (or late integration), respectively. 

In addition to replicating and extending Kurowski and Blumstein's findings using a multi- 
talker stimulus set, Repp made a preliminary attempt to address these two integration hypotheses. 
He formulated a simple probabilistic model of late information integration that predicted identifi- 
cation accuracy when two cues are available from identification performance for each cue presented 
in isolation. The i<redictions of the model generally fell short of the obtained identification scores, 
which was taken to mean that perceptual integration did occur at a relatively early stage, as 
hypothesized by Kurowski and Blumstein. However, the model may well have been too simple to 
represent the processes of cognitive information integration. Another relevant piece of information 
obtained in Repp's study was that murmur and vowel onset cues still appeared to be integrated 
better than predicted by the model (or, in any case, permitted surprisingly high identification 
scores) when as much as 60 ms of the waveform surrounding the point of articulatory release was 
replaced with noise. This finding casts doubt on the role of a peripheral integration mechanism, 
since such a mechanism presumably should have been more sensitive to disruption of physical 
continuity.. However, the noise may have enabled listeners to "restore" the missing acoustic infor- 
mation (cf. Warren, 1984). Clearly, Repp's data were not sufficient to decide between the early 
and late integration hypotheses, and further research was called for. 

The concept of late integration needs little justification, since separate sources of information 
can always be combined in cognitive decision making sis long as they are available at the same 
time (see, e.g., Massaro & Oden, 1980). The concept of early integration is more controversial, 
however. According to Kurowski and Blumstein's hypothesis, murmur and vowel onset "are 
not represented as separate cues, but are integrated by the auditory system into one unitary 
representation" (p. 389, emphasis added). As support for this claim, they cite the physiological 
studies of Delgutte (1980; Delgutte & Kiang, 1984), who found in cats that the neural response 
to a vowel onset was altered by a preceding nasal murmur, due to short-term adaptation of 
auditory nerve fibers. Kurowski and Blumstein conclude from this finding that "the auditory 
system does not treat transitions [i.e., the vowel onset] separately from the murmur" (p. 389). 
However, while Delgutte's results suggest that the auditory representation of the vowel onset is 
not independent of the preceding murmur, it does not follow that the two signal components, 
therefore, form an auditory unit. That is, one must distinguish between early integration, which 
yields a single auditory property, and early interaction among stimulus portions, which may 
modify their auditory representations while preserving them as separate sources of information 
that could be integrated by a later, cognitive process. Auditory adaptation would seem to be a 
likely source of early stimulus component interaction, but it is not clear how it ever could merge 
two signal portions of very different spectial structure and considerable temporal extent. Indeed, 
adaptation serves to enhance spectral changes in the signal (Summerfield, Haggard, Foster, & 
Gray, 1984) and thus is a mechanism of differentiation, not of integration. Thus, early integration 
of the kind envisioned by Kurowski and Blamstein seems unlikely as a general auditory function. 
Rather, the concept seems to reflect the axiomatic belief that single auditory properties underlie 
phonetic distinctions. This assumption is intended to relieve the listener's perceptual system from 
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a computational burden, which instead falls upon the investigator trying to define the critical 
properties (Repp, 1987b). 

Instead of the early integration hypothesis, therefore, the present series of experiments is con- 
cerned mainly with the perceptual consequences of early auditory interaction — henceforth, the 
(auditory short-term) adaptation hypothesis. Auditory short-term adaptation has been amply 
demonstrated not only in animals' auditory nerves (see also, e.g., Abbas k Gorga, 1981; Egger- 
mont, 1985; Harris k Dallos, 1979; Smith, 1979) but also behaviorally in humans in the form 
of forward masking, decay of sensation, and auditory aftereffects (e.g., Plomp, 1964; Viemeister 
k Bacon, 1982; Widin k Viemeister, 1979; Wilson, 1970; Zwislocki, Pirodda, k Rubin, 1959), 
including tasks involving phonetic judgments (Summerfield et al., 1984; Summerfield k Assmann, 
1987), even though adaptation may not be the only factor contributing to these phenomena. For 
all we know, then, auditory adaptation occurs continuously as we listen to speech. The question 
is: Does it help speech perception? Summerfield et al. (1984) and Summerfield and Assmann 
(1987) have argued that adaptation serves to enhance regions of spectral change, and that this 
may increase the intelligibility of speech, especially in noisy environments. In the specific case 
that concerns us here, viz., prevocalic nasal consonants, significant spectral change occurs at the 
point of release, where the nasal murmur changes into the vowel (and also beyond that point, 
during the formant transitions in the vowel). The murmur thus presumably has an adapting effect 
on the vowel onset that is proportional to the murmur spectrum, resulting mainly in attenuation 
of frequencies below 1000 Hz, where the murmur has most of its energy. Since distinctive place of 
articulation information is located at higher frequencies, some enhancement of vowel onset cues 
may result from the suppression of irrelevant spectral components (cf. Danaher k Pickett, 1975; 
Hannley k Dorman, 1983). The transitions of the second and third formants following vowel 
onset may also be enhanced somewhat by the (weak) presence of these formants in the murmur. 
More generally, the negative aftereffect of the murmur results in a direct auditory representation 
of the differences in spectral amplitude between the murmur and the onset of the vowel. This di- 
rect spectral difference information may be perceptually valuable, especially for the labile [mi]-[ni] 
distinction (Repp, 1986). 

It could be that such relational spectral information is the critical cue for place of articulation 
distinctions. (See Lahiri, Gewirth, k Blumstein, 1984.) This need not be so, however, for the 
murmur, as well as the later portions of A b+ vowel, provide additional spectral (and temporal) 
information that may feed into a central integration process. Repp's (1986) preliminary acoustic 
analyses suggest that spectral difference information alone is not sufficient to distinguish [m] 
and [n] across all vowel contexts, at least not in an invariant fashion. It also seems to vary in 
perceptual importance depending on the vowel, being more essential in [-i] than in [-a] context, for 
example. Thus it may be only one of several ingredients that enter into phonetic decisions. This 
means that the inputs to the central decision process probably include the murmur spectrum, 
the spectral relationship between the murmur and the vowel onset, and the continuing pattern of 
spectral change during the vowel. 

The present series of experiments was designed to test the adaptation hypothesis in a variety 
of ways. To repeat, that hypothesis states that adaptation by the nasal murmur modifies the in- 
ternal representation of the vowel onset spectrum and thus makes spectral difference information 
directly available to the auditory system, which is important for the correct perception of place of 
articulation. Therefore, identification scores should drop if the effect of ad . aon is reduced or 
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eliminated. It was assumed that auditory adaptation, being a peripheral process, would be sensi- 
tive to disruptions of the physical continuity of murmur and vowel, so Experiments 1-3 introduced 
manipulations such as order reversal, spatial separation, and temporal separation of murmur and 
vowel components. If such disruptions reduced identification performance substantially, a role of 
peripheral adaptation in providing place of articulation information would be suggested. If they 
had no effect, the auditory adaptation hypothesis could be rejected. A potential problem with 
this approach is that it is quite possible that spectral difference information, if it is not available 
as the direct consequence of peripheral auditory adaptation (and even if it is), is computed at a 
higher level in the perceptual system with the help of auditory memory (see Summerfield & Ass- 
mann, 1987; Summerfield et al., 1984), as suggested, for example, by research on auditory profile 
analysis (see Green, 1983). Such a central comparison process may also be sensitive to disrup- 
tions of physical continuity, and unless such disruptions turn out to be ineffective, the outcome of 
the experiments will be consistent with both peripheral and central explanations. TV distinguish 
further between these accounts, Experiments 4, 6, and 7 examined several predictions thought to 
be specific to peripheral adaptation, concerning the effects on intelligibility of murmur duration, 
murmur/vowel mismatches, and simulated spectral enhancement at vowel onset. Experiment 5 
addressed two alternative hypotheses, which will be introduced at that point. 

I. GENERAL METHODS 

A. Subjects 

Three different groups of 12 or 13 student volunteers served as paid subjects, each in a single 
session including several experiments. All subjects were native speakers of American English and 
considered themselves to be free of hearing problems. 

B. Stimuli 

The same basic stimulus set as in Repp (1986) was used, and the earlier article may be 
consulted for details. Briefly, the stimuli were [ma, mi, mu, na, ni, nu] produced by three male 
and three female talkers, 36 syllables in all. The syllables were low-pass filtered at 4.9 kHz, 
digitized at 10 kHz, and modified as required. The onsets of three pitch period.*; (or pairs of pitch 
periods, in female tokens) preceding and following the point of release were marked to serve as 
cutpoints in waveform editing. The temporal distance between these markers was approximately 
10 ms. 

C. Procedure 

The subjects listened in a quiet room over TDH-39 earphones at a comfortable intensity. 
Unless mentioned otherwise, all stimulus presentations were binaural with interstimulus intervals 
of 3 s. The subjects in Experiments 1-4 made a forced choice between /m/ and jwj for each 
stimulus, guessing when no nasal consonant was perceived. The subjects in Experiments 5-7 
used a free response set including /m, n, b, d/ and /-/ (no consonant) as explicit choices. The 
first group of subjects participated in Experiments 1 and 4; the second group in Experiments 3 
and 2; and the third group in Experiments 5, 6, and 7, in fixed order. (The experiments were 
renumbered for expository reasons in this article.) 
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D. Data Analysis 

Analyses of variance were performed on overall identification scores both across subjects 
(averaged over talkers) and across talkers (averaged over subjects). Therefore, two F values will 
be reported for each effect tested. 1 Differences among individual syllables will be discussed in a 
qualitative fashion. 



Experiment 1 tested the auditory adaptation hypothesis in a drastic fashion by reversing the 
order of the murmur and vowel components. Clearly, this manipulation eliminates any adapting 
effect the murmur might have on the vowel onset. Therefore, if adaptation enhances place of 
articulation cues, performance in the reversed condition should be much worse than when the 
murmur immediately precedes the vowel. On the other hand, x* r*:ost of the place of articulation 
information results from processing the two sources of information separately and coding them in 
a more permanent form before central integration (e.g., as vectors of likelihoods of category mem- 
bership; see Chistovich, 1985; Massaro & Oden, 1980), then their order might be less important. 
However, if important spectral relationships are extracted centr-Jly, that process mty well be 
sensitive to order also. Thus it was perhaps unlikely that no decline in performance would result 
from an order reversal; nevertheless, the fact that this result would provide conclusive evidence 
against the auditory adaptation hypothesis justified the experiment. 



The experiment included five conditions, each represented by a test sequence consisting of 
one randomization of the 36 stimuli. The first sequence contained the original, unaltered syllables 
and served as warm-up. The second sequence contained the same syllables, but with about 60 
ins of the waveform surrounding tlie point of release excised. In other words, approximately the 
last 30 ms of the murmur and the first 30 ms of the vowel (each corresponding to three male or 
six female pitch pulses) were removed and the two truncated stimulus components were joined 
together. This excision was done to increase the number of errors and thus to reduce ceiling 
effects. The relatively abrupt change from the murmur to the vowel was thought to enhance 
the effect of adaptation on the remaining place of articulation cues in the vowel, or at least 
not to decrease it. That the truncated vowel portions, as well as the truncated murmurs, still 
contained considerable place of articulation information was clear from earlier data (Repp, 1986). 
To confirm this, and to illustrate to the subjects the nature of the separate stimulus components, 
the third and fourth test sequences contained the truncated murmurs and vowelb, respectively, 
in isolation. The critical fifth sequence contained the truncated vowels followed by the truncated 
murmurs after a 300 ms silent interval. This interval was inserted to prevent the perception of 
postvocalic nasal consonants. 



1 Because of frequent perfect scores, an arcsine transformation of proportions was not used. 
It is believed that none of the conclusions would have changed, had such a transformation been 
applied. 



II. EXPERIMENT 1 



A. Methods 
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B. ResulU and Discission 

The results, averaged over subjects and talkers, are summarized in Table 1. Performance for 
the unaltered syllables was 95% correct; nearly ail errors occurred with [ni]. Excision of 60 ms 
surrounding the release caused a 10% drop in the average score, although identification of [ma] 
and [na] remained unaffected. Scores for isolated truncated murmurs and vowels were 56 and 61% 
correct, respectively. From these scores, Repp's (1986) simple late integration formula predicts 
an overall perloxmance of 66% correct for murmurs and vowels combined, without any relational 
information added. Clearly, however, such relational information played a role when murmur and 
vowel were concatenated (condition 2): Scores were much higher than predicted. In condition 5, on 
the other hand, where the murmur followed the vowel, performance was 67% correct. This is close 
to the predictions of the model, and while it is marginally better than identification of isolated 
vowels, F(l, 11) = 4.25, p = .0636;F(1,5) = 9.50,p = .0274, it is substantially lower .than the 85% 
correct obtained in the second condition, F(l,ll; = 49.35,p < .0001; F(l, 5) = 48.75,p = .0009. 
As Table 1 shows, this latter difference was obtained for all individual syllables, even though they 
differed markedly in their vulnerability to truncation. 



Table 1 

Percent Correct Scores for Individual Syllables 
in the Five Conditions of Experiment 1. 

Conditions 



Full syllable 
M + V 
M 
V 

V + (300 ms) + M 



These results confirm the important perceptual role of spectral difference information. When 
this information is directly available, speech intelligibility is much higher than when listeners can 
rely only on the cognitive integration of independent sources of information. Models ^f speech 
perception that assume the integration of independent cues (e.g., Massaro & Oden, 1980) are 
incomplete in this respect. The results are thus consistent with the adaptation hypothesis, but 
they cannot be taken as direct support for it. Relational information could also be derived 
by a nonperipheral spectral comparison process sensitive to temporal order and/or temporal 
separation. 

III. EXPERIMENT 2 

Before turning to finer parametric stimulus variations, the results of a second gross manipu- 
lation will be reported. The rationale for Experiment 2 was that, if adaptation takes place in the 
peripheral auditory system, it should be sufficient to present the stimulus components to different 
ears to eliminate it. Summerfield et al. (1984) found that an auditory aftereffect believed to rest 
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(M = 


murmur, V 


= vowel.) 














Syllables 








mi] 


[ni] 


[ma] 


[na] 


[mu] 


[nu] 


Average 


97 


74 


100 


99 


100 


100 


95 


68 


64 


99 


99 


89 


92 


85 


56 


47 


65 


49 


61 


58 


56 


51 


49 


58 


71 


57 


81 


61 


57 


47 


82 


76 


68 


71 


67 
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on adaptation dir appeared when the adapting and test stimuli were presented to opposite ears. 
However, any central processes that extract spectral relationships might operate on inputs from 
different ears. As in Experiment 1, it was perhaps unlikely that the segregation of murmur and 
vowel would have no effect at all on intelligibility, but the strong implications such an outcome 
would have for the adaptation hypothesis made the experiment worthwhile. 

A. Methods 

The same truncated murmurs and vowels as in Experiment 1 were used. There were three 
conditions, each consisting of one presentation of the 36 stimuli. In contrast to Experiment 1, 
however, the three conditions were randomized together. Two conditions were identical with con- 
ditions 2 (truncated murmur immediately followed by truncated vowel) and 4 (isolated truncated 
vowels) of Experiment 1, except that presentation was monaural. In the third, "split" condition, 
the tr icated murmur occurred on the opposite channel, immediately preceding the truncated 
vowel, »«hich was on the same channel as the other stimuli. Half the subjects received th<, vowel 
portions in the left ear, and half in the right ear. No ear differences were apparent, so the data 
were pooled over this variable. 

B. Results and Discussion 

Performance for the monaural murmur-vowel stimuli was 86% correct, which is similar to 
the score obtained (with different subjects) in Experiment 1. Performance for iso. » :d vowels 
(67% correct) was somewhat higher than in Experiment 1, but matches the score obtained by 
Repp (1986). Performance in the novel split condition was 78% correct, significantly higher than 
for isolated vowels, F(l,ll) = 17.47, j> = .0015; F(l, 5) = 9.08,p = .0297, but lower than for 
monaural murmur- vowel stimuli, F(l,ll) = 23.93, p = .0005; F(l, 5) = 8.07, ;> = .0362. 

Differences among individual syllables may be examined in Table 2. It appear* that [m-] 
syllables gained more from the addition of a contralateral murmur to the isolated vowel than did 
[n-] syllables. This is surprising in the case of [mi], whose murmur by Use!f conveyed very little 
reliable information, whereas the murmurs of [ma] and [mu] yielded the highest scores in isolation 
(see Table 1; also, Repp, 1986) and therefore were expected to make a large contribution. In the 
case of [no] and [nu], the negligible gain may have been due to the fact that the isolated vowels 
were identified almost as well as the monaural murmur vowel stimuli. The possibility of response 
biases canno* be ruled out. 2 If the task is considered as one of [m]-[n] discriminat ; on wUhin each 
vocalic context (e.g., if percent correct scores are computed for [m]-[n] pairs), all inco: istencies 

2 Although it seemed at times as if isolated murmurs elicited a response bias in favor of 
/m/ (cf. also Malecot, 1956), this tendency may indicate that labial place of aiticulation is more 
effectively conveyed by the murmur spectr. m than is alveolar place. It also depends on the original 
vocalic context in a way that can be rationalized by reference to speech production (Repp, 1986). 
It is not clear, therefore, whether a meaningful distinction between discriminability and response 
bias can be made. 
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disappear, and performance in the split condition is intermediate between the other two conditions 
in all three vocalic contexts. 



Table 2 

Percent Correct Scores for Individ 1 Syllables 
in the Three Conditions of Experiment 2. 





(M = 


murmur., 


V" = vowel, / = split between 


ears.) 




Conditions 








Syllables 










[mi] 


[ni] 


[ma] 


[n«] 


[me] 


[nu] 


Average 


M + V 


72 


75 


100 


33 


92 


93 


86 


V 


49 


53 


79 


.11 


57 


86 


67 


M / V 


76 


43 


97 


76 


83 


90 


78 



The results suggest, then, that channel separation of murmur and vowel disrupts the ex- 
traction of spectral difference information. This is consistent with the adaptation lijpothesis, 
but it could also be that there is a central process of spectral comparison that is sensitive to 
spatial separation of sound sources. The scores in the split condition seem fairly close to what 
one should expect on the basis of late integration of independent sources of information, so the 
central process responsible for that integration presumably was not affected. While tl e results 
of Experiment 2, like those of Experiment 1, do not permit rejection of any specific hypothesis, 
they do suggest that spatio-temporal contiguity of signal components is required for the effective 
detection of relational spectral cues. 

IV. EXPERIMENT 3 

The obvious next sten was to determine how close in time the two signal components must 
be for listeneis to reap the benefits of spectral difference information. One of the more striking 
findings of Repp (1986) was that substitution of signal-correlated noise (SCN) for the 60 ms 
of waveform surrounding the consonantal release resulted only in a relatively small decrement in 
overall identification performance; the syllables [mi] and [ni] supplied virtually all the errors. Repp 
concHded that murmur and residual vowel onset cues were perceptually integrated across the 
intervening noise; that is, it appeared that spectral difference information remained largely intact. 3 
This result is not necessarily damaging to the adaptation hypothesis. Short-term adaptation may 
last for 150 ms or more (Delgutte, 1980; Summerfield et al., 1984), and a brief broadband noise 
may dilute but not eliminate the effect, just as would decay of adaptation during a 60-ms silent 



3 A related result has been obtained by Whalen and Samuel (1985), who substituted a non- 
speech noke for the initial 60 ms of the vowel in fricative- vowel syllables and found that classifi- 
cation reaction time was slowed when the fricative noise had been cross-spliced from a different 
vocalic context. That is, listeners detected subtle phonetic mismatches between fricative noise 
and vowel across a 60-ms intervening noise, just as they did when no noise was present. The 
detection of such mismatches may rest on the extraction of spectral difference information from 
the speech signal. 
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interval. However, if this interval were extended, a substantial decrement in adaptation should 
be observed. 

To test these predictions, Experiment 3 assessed identification performance for stimuli whose 
murmur and vowel components were separated by pilent intervals of up to 240 ms duration. The 
use of silence rather than noise was justified by the results of another experiment, not reported in 
detail here, which showed that intervening signal-correlated noise, broadband noise, and silence 
had statistically equivalent effects. 4 

A. Methods 

The truncated murmur and vowel components were used again, separated by 0, 30, 60, 120, 
or 240 ms of silence. All five conditions were randomized together and recorded in five blocks of 
36 syllables each. 

B. Results and Discussion 

The results are summarised in Table 3. There was no decline in performance over the first 60 
ms of separation. Only at the longer intervals was there a small reduction in performance. Overall, 
the effect of tempor 1 separation was significant acrcss subjects F(4,44) — 3.70, p = .0111, but 
not across talkers. With regard to individual syllables, it can be seen that identifiability declined 
with silence duration for [n-] bat not foi [m-] syllables. This may once again have been due either 
to an /m/ response bias thai emerged as the murmur was separated from the vowel, or it may 
indicate that labial place of articulation was perceptually more stable under these conditions. 



Table 3 



Percent Correct Scores for Individual Syllables 
in the Five Conditions of Experiment 3. 



Silence 






Syllables 










Duration 


Imi] 


[ni] 


[ma] 


[na] 


[mu] 


[nu] 


Average 


0 ms 


71 


67 


100 


89 


92 


94 


85 


30 ms 


81 


57 


99 


94 


93 


96 


87 


60 ms 


78 


51 


100 


90 


93 


94 


85 


120 ms 


74 


53 


99 


86 


99 


81 


82 


240 ms 


76 


54 


97 


83 


92 


76 


80 



4 Signal-correlated noise is spectrally uniform (Schroeder, 1968) but preserves tne amplitude 
envelope of the replaced signal, which may aid listeners in "restoring" missing phonetic informa- 
tion (see Warren, 1984; Whalen & Samuel, 1985); if anything, however, the noise interfered more 
with consonant identification than did silence. In a recent study using similar methods, Parker 
and Diehl (1984) likewise found no difference between the effects of intervening noise and silence 
on vowel identification performance in "centerless" CVC syllables, and Whalen (1984) also found 
effects of fricative- vowel mismatches across an intervening 60-ins silent interval, just as he di \ 
across an intervening noise (Whalen & Samuel, 1985). 
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These results are not so easy to reconcile with the adaptation hypothesis. First, the decline 
in performance was small and did not occur with all syllables and talkers. Second, there seemed 
to be no decline at all over the first 60 ins of separation, although auditory adaptation, which 
decays exponentially (Eggermont, 1985; Harris & Dallos, 1979), should have decreased signifi- 
cantly in that interval. Since the truncated murmur and vowel components were in their original 
temporal relationship when separated by 60 ms, a perceptual advantage resulting from this fact 
may conceivably have counteracted any decline due to decay of adaptation at short intervals. 
Apparen'. 1 ,, however, listeners still had spectral difference information available with 240 ms of 
temporal separation, and this suggests that they used auditory memory for the murmur to deter- 
mine its spectral relationship to the vowel onset. Whether this was a compensatory perceptual 
strategy or whether it reflects what occurs in intact syllables is not clear. 



Experiment 4 addressed two further predictions of the adaptation hypothesis, which con- 
trasted with predictions arising from the alternate hypothesis that murmur and vowel onset 
function as independent cues that are integrated at a late stage (e.g., Massaro & Oden, 1980). 
One prediction concerned the effect of murmur duration. Physiological studies have shown that 
auditory adaptation in animals increases with adaptor duration up to about 100 ms (Harris & 
Dallos, 1979; Westerman & Smith, 1984). Even though the temporal parameters may not be 
exactly the same in the humui auditory system, to the extent that auditory adaptation by the 
murmur enhances the spectral structure at vowel onset, there should be a beneficial effect of 
increasing murmur duration (up to about 100 ms) on identification of murmur-vowel stimuli. In 
isolated murmurs, however, there can be no such enhancing effect of adaptation; therefore, in- 
creasing murmur duration beyond some minimum should have little influence on intelligibility. 
This was already suggested by Repp' (1986) analysis of the effect of natural variations in mur- 
mur duration; in addition, he found that the intelligibility of truly steady-state isolated murmurs 
decreased as their duration was increased, perhaps because their artificial quality became more 
apparent as they got longer. Thus a statistical interaction of the effect of murmur duration with 
the factor of presence versus absence of a following vowel is predicted. A contrasting prediction 
emerges from the late integration of independent cues hypothesis: Whether increasing murmur 
duration increases or decreases the informational value of the murmur, it should do so regardless 
of the context in which the murmur occurs. 

A second prediction examined by Experiment 4 was this: If auditory adaptation caused by 
the murmur improves perception of higher formants at vowel onset, then a beneficial effect of 
prefixing an isolated vowel portion with a murmur should be obtained regardless o' whether or 
not the murmur derives from the same utterance. The reason is that all murmurs are spectrally 
rather similar below 1000 Hz, where most of their energy is concentrated. And although [in] 
and [n] murmurs differ in the frequencies of their higher formants, which are continuous 
the formants at vowel onset, it may be argued that the spectral change at vowel onset won' )e 
enhanced even more if the murmur formants were different from those at vowel onset. The p?ra- 
doxical prediction is, therefore, that addition of an inappropriate murmur to an isolated vowel 
may improve identification, relative to the isolated vowel baseline. The opposite result is pre- 
dicted by the independent cues hypothesis: The introduction < r a conflicting cue cannot pos^bly 
improve performance. (Late integration of murmur and vowel onset cues may occur following an 
early auditory interaction, in which case two opposing tendencies may cancel in the data.) To 
test these predictions, the experiment included both compatible and conflicting murmur-vowel 
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combinations Thus it was l\so possible to compare directly two types of conditions for nasal con- 
sonants that previously have been employed in separate studies (Kurowski & Blumstein, 1984; 
Malecot, 1956) or with other place of articulation contrasts (Recasens, 1983). 

A. Methods 

The experiment included one long randomized iest sequence composed of 9 x 36 = 324 
stimuli, and a shorter sequence of 3 x 36 = 108 stimuli. The stimulus components were steady- 
state murmurs generated by reiterating a single 10-ms segment of the original murmurs, taken 
from the vicinity of the release (see the Static Excerpts condition of Repp, 1986) and vowel 
portions whose initial 10 ms (one male or two female pitch pulses) had been removed, 5 The 
first test sequence contained the vowel portions in isolation and immediately preceded by 1, 3, 
6, or 12 segiiicnts of matched or mismatched murmur. The murmur durations thus were in the 
vicinity of 10, 30, 60, and 120 m^. The mismatched murmurs came from the syllable with the 
same vowel but a different consonant, produced by the same speaker. The second, shorter test 
sequence contained only isolated murmurs of 30, GO, and 120 ms duration. (The 10-ms murmurs 
were omitted because they were easily missed in listening.) 

B. Results and Discussion 

The overall results are shown in Figure 1. The figure plots percent correct scores as a function 
of murmur duration for isolated murmurs and for murmur- vowel stimuli with matched and with 
mismatched components. (In the case of mismatched components, "correct" responses are defined 
with respect to the vowel portion.) The data point on the ordinate, corresponding to zero murmur 
duration, represents the score for isolated vowels (72% correct). The results indicate that addition 
of a 10-ms matched or mismatched murmur to the vowel changed identification performance little, 
whereas addition of a murmur 30 ms long or longer resulted in an improvement, but only if the 
murmur matched the vowel. Mismatched murmurs neither improved nor hindered identification. 
Isolated murmurs of 30 and 60 ms duration were identified at levels above chance, but 120-ms 
murmurs could not be reliably identified. This last finding (which may have been a consequence 
of the artificial steady-state nature of the murmurs; cf. Repp, 1986) contrasts with the diherential 
effect of 120-ins matched and mismatched murmurs when they preceded a vowel. 

A two-way analysis of variance of the scores for the murmur- vowel stimuli yielded a significant 
effect of match/mismatch, F(l,ll) = 19.01, p = .0011;F(1,4) = 13.31, p = .0218, and a signif- 
icant interaction with murmur duration, F(3,33) = 6.34, p = .0016; F(3, 12) - 4.28, p = .0285, 
obviously due to the shortest murmur duration, whereas the main effect of murmur duration was 
not significant. A separate analysis of the isolated murmurs showed a significant effect of murmur 
duration, F(2,22) = 3.98, p = .0335;F(2,8) = H.8 r >,p = .0094, suggesting that the performance 
decrease for the longest murmurs was real. 

5 The artificial murmurs were used to have better contiol over murmur duration and amplitude 
contour, and slightly truncated vowels were employed to avoid ceiling effects in pei font. am e. TL* 
truncation was less than in Experiments 1-3, but for no stringent reason; as before, it was assumed 
that truncation would merely reduce the information available without changing basic auditory 
and perceptual processes. 
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Figure 1. Results of Experiment 4: Percent correct identification of isolated vowels (V), isolated murmurs (M), 
and inurinur-vowel stimuli (M + V) with matched and mismatched components as a function of murmur duration. 

These overall results cannot be given much weight, however, in view of very striking depen- 
dencies on vocalic context. The pattern of results r individual syllables is shown in Figure 2. 
Each panel shows data for one vocalic context, with solid and open symbols representing [in-] 
and [n-] syllables, respectively. Consider first the [-a] and [-u] syllables (left and right panels). 
The isolated vowels of [na] and [nu] were identified much more a* :urately than those of [ma] and 
[mu], which replicates earlier findings (Experiments 1 and 2; Repp, 1986) and probably reflects 
the greater perceptual salience of alveolar than labial formant transitions (or onset spectra). Be- 
cause of this pattern, a |(m)a] or [(m)u] vowel benefited from addition of a murmur (even a 10-ms 
one) while a |(n)a] or |(n)u] vowel did not. Identification performance was uniformly high for all 
murmur-vowel stimuli in [-a] and [-u] context. Moreover, there was very little difference between 
scores for stimuli with matched and mismatched components. Identification of [(m)a] and [(m)u] 
vowels was improved by addition of a mismatched murmur almost as much as by addition of a 
matched murmur, and identification of |(n)a] and [(n)u) vowels was at least not hampered by 
addition of a mismatched murmur. 

This part of the data is consistent with the adaptation hypothesis. As to the predicted effects 
of murmur duration, they are smaLer than expected but are also compatible with the hypothesis. 
The results are inconsistent with the independent < ues hypothesis, according to which performance 
should have decreased in the mismatched conditions. 

The pattern for [mi] and [ni] stimuli (center panel of Figure 2) is very different horn the 
results just described. Identification of isolated vowels and isolated murmurs was extremely poor, 
in agreement with earlier results. Addition of a 10-ms murmur to the vowel had no effect, but 
addition of a murmur 30 ms or more in duration elicited responses that reflected the nature of the 
murmur. Thus there was a large effect of match versus mismatch, which accounts for the average 
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Figure 2. Results for individual syllables in Experiment 4 

effect shown in Figure L Since the murmurs were barely discriminate in isolation ) especially when 
they were 120 ms long, listeners cannot have relied on them directly to identify the consonant in 
these syllables. The data thus support the earlier conclusion (Repp, 1986) that [ i] syllables are 
special in that place of articulation Information lies almost entirely in the relationship between 
the murmur and the vowel, that is, in the pattern of spectral change. A possible implication is 
that there are differences between fm(i)] and [(n)i] murmurs that are difficult to detect in isolation 
but that become perceptually sali it when the murmur is followed by a vowel. Such a retroactive 
enhancement effect would refute the adaptation hypothesis. Yet there is a way in which it could 
a ase through adaptation: Different murmurs might impose their inverse spectrum on the vowel 
onset, thereby creating a place of articulation cue following the release. On the other hand, 
the independent cues hypothesis, unless it is extended to include relational information, cannot 
explain how murmurs that are uninformative in isolation convey phonetic differences in context. 

In summary, while the results of Experiment 4 argue very clearly against the independent 
cues hypothesis and thereby affirm the importance of relational spectral information, they are 
perhaps still compatible with a peripheral accoimi of spectral difference detection. 

VL EXPERIMENT 5 

Prior to Experiments 6 and 7, which attempted to test the adaptation hypothesis in y°t 
another way, Experiment 5 examined two alternative explanations of how a preceding murmur 
might enhance the perception of vowel onset cues. One hypothesis (Repp, 19K6) takes account 
of the fact that the murmur is the major carrier of nasal manner information. If it were the 
case that place of articulation perception is not independent of manner perception (see Carden, 
Levitt, Jusczyk, & Walley, 1981; Miller, 1977), then hearing the correct manner may enhance the 
accuracy of place identification. Kurowski and Blumstein (1984) reported that tluir CV syllable, 
were identified as beginning with oral stop consonants when the nasal murmur was excised. Their 
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subjects chose from the response set /m, n, b, d/ and gave about 84% /b, d/ responses to 
murniurless stimuli but only about 12% to stimuli with an initial murmur. Thus removal of the 
nasal murmur clearly changed manner of articulation perception and perhaps affected place of 
articulation perception as well, particularly since the isolated vowel portions of nasals lack the 
release bursts commonly associated with oral stop consonants. In Repp's (1986) experiments, 
and in Experiments 1-4, subjects always were required to make a forced choice between /in/ and 
/a/ > regardless of whether they perceived the correct manner or indeed any consonant at all. One 
purpose of Experiment 5 was to de' ermine first whether the present stimuli resembled those of 
Kurowski and Blumstein (1984) in that removal of the murmur resulted in the almost complete 
loss of nasal manner cues, and then whether correct perception of place was contingent on correct 
perception of manner. 

A second hypothesis addressed by Experiment 5 derives from observations .by Pols and 
Schouten (1978, 1981) on the perception of truncated stop-consonant-vowel syllables. These 
authors argued that the relatively abrupt stimulus onset following truncation causes spectral 
splatter (a "click sensation") that interferes with the perception of place of articulation cues 
Identification scores improved substantially when the truncated syllables were preceded by noise 
bursts that masked the abrupt onset (Pols & Schouten, 1978). Ohde and Sharf (19Si) apphed a 
smoothing function to the onsets of truncated CV syllables, apparently with similar results (see 
Pols & Schouten, 1981). It is possible that part of the intelligibility decrement for isolated vowel 
portions in Experiments 1-4 was caused by abrupt stimulus onsets. Vo check on this, a smoothing 
function similar to that used by Ohde and Sharf (1981) was applied to the stimulus onsets on 
half the trials in this experiment. 

A. Methods 

The experimental tape contained 8 x 36 = 288 isolated vowel stimuli in random order. Each 
vowel was truncated approximately 0, 10, 20, and 30 ms after the release (see Repp, 1986); 
thus none of them contained any nasal murmur. (It was quite clear from informal listening 
that inclusion of even a very brief murmur resulted in the perception of a nasal consonant.) 
Each truncated stimulus occurred in two versions, one unaltered and the other with a lineai 
amplitude ramp, rising from near-zero to full intensity in 10 ms, applied to the onset of the 
digitized waveform. The subjects 1 task was to report for each stimulus the initial consonant thej 
heard, choosing from the set /in, n 1), d/, and to write down a dash when no consonant was 
heard. 

B. Results and Discussion 

The overall results, averaged over the ramped and uniaiiiped stimulus versions, are shown 
in Figure 3. Three measures were derived from the data. The first, p(C), was the percentage of 
trials on which a consonant .i* reported. No surprisingly, it declined with progressive truncation, 
F(3,36) - 47.05, j> < .0001; F(3, 15) - 94.55,/) <. .0001, although the vowel portions w-re still 
heard as containing initial consonants on about half the trials even after their initial 30 ms had 
been deleted. The other two measures were conditional on a consonant beii g reported. The 
percentage of correct place identifications, p c (P|0), declined only very slight!) with truncation, 
F(3,36) = 2.17,j> = .1081;F(3,15) = 6.45, j, - .0051, suggesting that the decrease in two- 
alternative forced-choice identification scores with progressive truncation (Repp, 1986) was caused 
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Figure 3. Results of Experiment 5: Percentages of consonant responses, p(C), of correct place of articulation 
identifications given a consonant response, p c (P|C), and of nasal consonant responses given a consonant response, 
p(N|C), as a function of cutpoint location. 

more by the total loss of consonantal cues than by misleading residual cues. Most interestingly, the 
percentage of nasal consonant responses, p(N|C), did not decline at all with progressive truncation, 
but actu?Jly showed an initial increase, F(3,36) = 9.72, p = .0001, F(3, 15) = 10.22, p = .0006. 
Regardless of how much consonantal information was available, about half of the consonants 
reported were nasals. This percentage is much higher than that reported by Kurowski and 
Blumstein (1984), even though removal of the nasal murmur undoubtedly caused a significant 
loss of nasal manner information. Presumably, the talker used by Kurowski and Blumstein closed 
his velum more rapidly after the consonantal release than did the present talkers, who tended to 
nasalize the vowel onset. 

Differences among individual syllables are shown in Figure 4. With regard to the percentage 
of consonant responses (left panel), it can be seen that [mu] and [ni] were affected much more by 
excision of the murmur (0 ms cutpoint) than the other syllables. This probably reflects the weak 
form ant transitions in these stimuli, which have similar articulatory configurations for consonant 
and vowel. Further truncation had especially strong effects on [ma] and [mi], indicating the loss 
of rapid labial transients at stimulus onset. Perception of the consonants in [na] and [nu], which 
have relatively long vocalic formant transitions, was most resistant to vowel truncation. 

The most striking difference m correct place of articulation identification scores (center panel) 
was between [ni] and all other syllables. Without the murmur, [ni] tended to be misidentified as 
labial, which indicates that the vowel did not contain any useful formant transition information. 
The same may well be true for [mu], and the 70-80% labial responses to both of these sy^ab^s 
may represent a bias to respond with labial consonants in the absence of dear place of art. ulation 
cuec. Only [ma] and [mi] were affected by vowel truncation. 
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Figure 4. Results for individual syllables in Experiment 5. 

The percentage of nasal responses (right panel) was lower for [mi] and [ni] than for the other 
syllables. The difference between [-i] and [-a] syllables may be explained by the fact that the 
velum is raised faster for high than for low vowels following a nasal consonant (Bell-Berti, Baer, 
Harris, & Niimi, 1979), making the former less nasalized. It is not clear, however, why the [-u] 
syllables resembled more the [-a] than the [-i] syllables in degree of perceived nasality, or why 
perception did not fully compensate for the expected differences in velar elevation for vowels of 
different heights (see Abramson, Nye, Henderson, & Marshall, 1981). 

The principal hypothesis addressed by this experiment concerned the possible dependence 
of place perception on manner perception. Since only about half of the initial consonants per- 
ceived in truncated syllables were nasal, it is indeed possible that place of articulation perception 
suffered because of insufficient manner cues. If so, place identification contingent on correct per- 
ception of nasal manner should have been more accurate than place identification contingent on 
perception of non-nasality. Examination of these percentages (computed from the syllable aver- 
ages), however, revealed only a small difference (2% on the average) in the predicted direction. 
This difference, moreover, derived entirely from the stimuli with tapered onsets (5.5% average 
difference); for the others, there was a 1.6% difference in the opposite direction. Although the 
effect of amplitude tapering deserves attention (see below), all stimuli in earlier experiments were, 
of course, untapered. For those stimuli, then, there is no evidence that incorrect perception of 
manner impaired place of articulation identification, so the perceptual enhancement of place cues 
when a vowel is prefixed with a murmur cannot be explained on that basis. 

It is noteworthy, however, that there were very large differences among individual syllables. 
The differences between correct place identification scores contingent on perceived nasal and non- 
nasal manner were: -2.8% for [ma], -18.3% for [mi], -28% for [mu], 12.5% for [na], 42% for [ni], 
and 6.3% for [nu]. It thus appears that, when a consonant was perceived as non-nasal, there was a 
strong shift in favor of labial responses; the differences in absolute magnitude of this shift among 
the six syllables probably derived largely from ceiling effects. Thus there was a dependency of 
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place of articulation identification on manner, though in terms cf criterion rather than accuracy. 
This etiect is in agreement with earlier findings (Larkey, Wald, & Strange, 1978; Miller, 1977) of 
a relative shift in the category boundary on synthetic /ba/-/da/ and /ma/-/na/ continua. One 
likely cause for this is the absence of release bursts in both the synthetic stop-consonant -vowel 
stimuli used previously and in the present vowel portions. In real speech, alveolar oral stops have 
stronger release bursts than do labial oral stops, so the absence of bursts promotes the perception 
of labials, provided that a stop consonant is perceived. 

Turning finally to the effect of amplitude tapering, there were small but consistent effects 
on two of the three overall performance measures. The percentage of consonant respor-es was 
reduced by about 7% at all stages of truncation, F(l,12) = 16.19, p = .0017; F(l, 5) = f .12, j> = 
.0563, which suggests a loss of general manner cues at stimulus onset. Given that a consonant was 
heard, however, place of articulation identification was improved by about 5% overall, F(l, 12) = 
7.60, p = .0174; F(l, 5) = 11.17,p = .0205. This effect is in agreement with the observations of 
Pols and Schouten (1981) on the interfering effect of abrupt stimulus onsets, althc .gh the size of 
the present effect was rather small — certainly much smaller than the improvement obtained by 
Pols and Schouten (1978) with a noise prefix. Actually, the present improvement derived solely 
from those trials on which nasal consonants were perceived (cf. the interaction reported above); 
when nasality was not perceived, there was no effect of tapering. This is less in agreement with 
Pols and Schouten. Onset tapering had no systematic overall effect on nasal manner perception. 

In summary, the results of this experiment do not support the hypothesis that, when a vowel 
is preceded by its original murmur, part of the improvement in place of articulation identification 
derives from the restoration of correct manner identification. Perception of nasal manner does 
not seem to enhance perception of place, at least not in untapered stimuli as used previously; 
it only shifts the response criterion in favor of alveolar responses. The second hypothesis, that 
elimination of abrupt onsets improves place perception, receives some limited support from the 
present results.. Though the effect is rather small, it may add to the contribution of a preceding 
murmur. However, it cannot explain correct perception of the intact syllable [ni], or of [mi] with 
truncated vowel, for which the murmur and the vowel in isolation are equally uninfonnative. 
The concept of relational information is still required, and so we must return to the adaptation 
hypothesis. 



The final two experiments in this series provided perhaps the most direct test of the adapta- 
tion hypothesis. If peripheral adaptation by the murmur enhances spectral information at vowel 
onset, then it should be possible to simulate tins enhancement by filtering the vowel onset in the 
absence of a preceding murmur. Such artificial enhancement then should result in improved place 
of articulation identification from isolated vowel components. Confirmation of this prediction 
would not only provide strong support for the adaptation hypothesis, but it would also lead to 
a re-evaluation of earlier conclusions based on place of articulation identification from isolated 
vowel portions (Experiments 1, 2, and 5; Kurowski fc Blunistein, 1984; Repp, 1986), which did 
not consider that removal of a murmur also eliminates its adapt ive aftereffect. 

In choosing an appropriate filtering function, decisions had to be made concerning its shape, 
depth, and decay over time. Acoustical analysis of the nasal murmurs indicated that most of 
their energy was below 1000 Hz, and that the peak corresponding to the first formant was about 
30 dB higher, on the average, than the peaks of the higher formants above 1000 Hz. Only the 
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higher formants, however, varied with place of articulation. Ideally, the spectral shape of the filter 
should initially mirror that of the natural murmur and then wane over time, simulating decay of 
auditory adaptation. These objectives were difficult to achieve simultaneously with the facilities 
available. In Experiment 6, therefore, it was decided to use a simple high-pass filter with u cutoff 
frequency of 1000 Hz, which permitted variable stop-band attenuation to simulate decay. The 
experiment thus tested one specific version of the adaptation hypothesis, viz., that enhancement 
of place cues in higher formant transitions at vowel onset results from suppression of energy in 
the region of the first formant. As to the decay time, it was assumed that it would be rather 
short during stimulation by the vowel itself. (Most estimates of decay times in the literature 
derive from observations during silent intervals.) Even if the range chosen (up to 30 ms) seems 
too short, it became clear during stimulus preparation that more extensive filtering led to very 
unnatural-sounding stimuli. 

A. Methods 

The basic stimuli were the complete vowel portions of the original 36 syllables. Even though 
ceiling effects in performance were expected to limit the sensitivity of the experiment to beneficial 
(but not detrimental) effects of filter ng, no truncation was performed on the vowels in this 
study and the next, so as to preserve the original acoustic properties of the vowel onsets. Three 
degrees of high-pass filtering were imposed on initial pitch-pulse segments, leaving the rest of the 
waveform intact: (1) the initial 10-ms segment only, with 10 dB stop-band attenuation; (2) the 
initial segment with 20 dB, and the following segment with 10 dB stop-band attenuation; (3) 
the initial segment with 30 dB, the following segment with 20 dB, and the final segment with 
10 dB stop-band attenuation. Thus, three degrees of adaptation with three decay times were 
crudely simulated. The filtering was performed digitally, using an eighth-order elliptic filter with 
a fixed cut-off frequency of 1000 Hz and variable attenuation, constructed by the EFI subroutine 
of the ILS package (Version 4.0, Signal Technology, Inc.). The boundaries of the pitch pulses(s) 
to be filtered in each pass through the routine were specified precisely in tenths of milliseconds, 
according to Repp's (1986) cutpoint markers. The result was verified through inspection of 
waveforms and acoustic analysis. The four series of 36 stimuli (three filtered, one unaltered) 
were randomized together. Subjects were instructed to identify each stimulus as beginning with 
/m,n,b,d/ or /-/ (no consonant). 

B. Results and Discussion 

The overall results are shown in Figure 5 in terms of the three performance measures in- 
troduced in Experiment 5. Looking first at the p(C) scores, it can be seen that, in agreement 
with the results of Experiment 5, the unaltered syllables elicited close to 80% consonant responses. 
This percentage declined to 65% with progressive filtering: F(3, 3b) = 18.47, p < .0001; F(3, 15) = 
14.49, p = .0001, suggesting that the first formant contributed general consonant manner informa- 
tion. A decline with respect to the unaltered stimuli was also observed in the conditional percent- 
age of nasal consonant responses, p(N|C), F(3,36) = 5.33, p - .0038; F(3, 1 5) = 6.86,p - .0039, 
although it did not seem to depend on the extent of filtering. Most importantly, the conditional 
percentage of correct place of articulation identifications, p c (P|C), also declined, rather than 
increased, with increasing extent of filtering. Although absence of an increase in performance 
could be blamed on ceiling effects, and although the decline is rather small and nonsignificant 
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Figure 5, Results of Experiment 6: Three performance measures as a function of temporal extent of high-pass 
filtering. 

these data offer no support for the hypothesis ihat attenuation of irrelevant low-frequency energy 
enhances place-of-articulation cues at higher frequencies. 

Figure 6 shows the results for individual syllables. In the left panel it can be seen that 
consonant responses decreased most strongly for [ma] and [mi], whereas [mu] actually showed an 
increase with filtering. Place perception suffered in all syllables but the poorly identified [ni], for 
which there /as an increase with filtering. Since identification of this syllable never exceeded 
chance level, the increase is probably a criterion effect. Perception of nasality suffered in all 
syllables but [mal, which showed an increase with filtering. These interactions are curious, but 
they do not change the general conclusions. 

VIII. EXPERIMENT 7 



The results of Experiment 6 lend no support to the specific hypothesis that aud ory adap- 
tation enhances place of articulation perception through elimination of irrelevant kn, irequency 
spectral energy. It is still possible, however, that a beneficial effect of adaptation occurs at higher 
frequencies, where the important place of articulation cues reside. To test this version of the 
adaptation hypothesis, it was necessary to use a filter that preserves the detailed spectral shape 
of the murmur, with some loss of flexibility in other respects. 



A. Methods 

From each of the 36 original murmurs, a 14-coefficient LPC spectrum was computed using a 
25.6 ms Hamming window ending about 10 ins before the point of release (ANA program of the 
ILS package). Each of these spectra was subsequently used as an inverse filter on the complete 
vowel portion of each syllable (FLT program). Degree of attenuation could not be varied easily ; n 
this procedure. To vary temporal extent in synchrony "Mi pitch pulses, which could not be done 
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Figure 6. Results for individual syllables in Experiment 6. 

directly, the initial one, two, or three pitch-pulse segments of the filtered vowel (about 10, 20, 
and 30 ms Jong) were concatenated with the remainder of the unfiltered vowel, using a waveform 
editing program. The success of the filtering procedure was verified by acoustic analysis. The 
resulting 4 x 36 stimuli (including the unaltered versions) were recorded in a randomized sequence. 
The subjects' instructions were the same as in Experiment 6., Two aMitionaJ sequences of 36 
stimuli each were recorded afterwards, each containing the excerpted initial 30-ms segments of 
the vowels, first unfiltered and then filtei^a. The purpose of this was to assess to what extent 
any perceptual effects of filtering depended on the following unfiltered vowel or were artifects of 
the abrupt amplitude change between filtered and unfiltered waveform segments. In responding 
to these final two sequences, subjects had to make a forced choice between /m/ and /n/ for each 
stimulus. 



B. Results and Discussion 

Figure 7 shows the overall results for the main test. It can be seen that the pattern was rather 
similar to that obtained with high-pass filtering (Figure 5). Consonant responses increased slightly 
initially but then decreased with increasing filtering: F(3,3C) = 13.98,p < .0001; F(3, 15) - 
8.91,p = .0012. Nasal consonant responses dropped considerably with minimal filtering and then 
recovered partially as filtering increased: F(3,36) = 26.79,/? < .0001; F(3, 15) = 16.45,j> - .0001, 
Correct place of articulation responses were not significantly affected, but certainly showed no 
tendency to increase. The results for the isolated 30-ms segments likewise showed no advantageous 
effects of filtering: Forced-choice identification scores were 66, >>% and 64.3% for unfiltered and 
filtered excerpts, respectively — a nonsignificant difference. 

Scores for individual syllables are shown in Figure 8. It can be seen that consonant responses 
increased initially for [mu] and [ni], suggesting that an initial amplitude discontinuity provides 



170 



Auditory Short-term Adaptation 
100 



167 




0 10 20 30 

EXTENT OF FILTERING (ms) 

Figure 7. Results of Experiment 7:- Three performance measures as a function of temporal exten, of inverse 
filtering. 



CO 
UJ 
CO 

o 

CL 
CO 
LU 



Ll! 

o 

LU 
CL 



100 



80 



60 



40 



20 



\ \ A 




" p(N|C) 














: a. 

_<y 










P(C) 
1 1 1 1 


p c (p|c) 

1 1 1 1 


1 1 1 1 J 


0 10 20 30 


0 10 20 30 


0 10 20 30 



a ma 
• mi 
■ mu 

a na 
o ni 
a nu 



EXTENT OF FILTERING (ms) 

Figure 8. Results for individual syllables in Experiment 7 

a general con son ai. manner cue. With more extensive filtering, however, the cue lost its effec- 
tiveness, and consonant scores declined for all syllables. Place of articulation identification was 
strikingly improved by filtering for one syllable, [ni], but it decreased for [mi] and [mu], The 
opposite effects of filtering on [mi] and [ni] suggest that, rather than improving place of articu- 
lation perception, the filtering introduced a bias to perceive /n/. No striking differences among 
individual syllables were observed with regard to perception of nasal manner. 



171 



168 



Bruno H. Repp 



In summary, these results do not support the adaptation hypothesis. It is possible, of course, 
that perceptual benefits of spectral enhancement are obtained only when a murmur is physically 
present. If so, however, the implication would be that the crucial spectral relationships are 
computed at a higher level, rather than being directly available in the auditory system, 

IX. SUMMARY AND CONCLUSIONS 

As was already clear from earlier research, the murmur and vowel portions of nasal-consonant- 
vowel syllables do not make independent contributions to place of articulation perception; their 
relationship also plays a role. (For a recent convincing demonstration of the general importance 
of spectral change information in speech perception, see Furui, 1986.) This finding, which is 
strongly supported by the present results, argues against models of perceptual integration based 
on spectrographically defined cues, which do not take relational information into account. Such 
models have, more or less explicitly, formed the basis of much past research on speech perception 
(e.g., Massaro & Oden, 1980; Repp, 1982). While they may be accurate when the cues represent 
different (e.g., spectral vs. temporal) aspects of the speech signal, they need to be augmented by 
a relational term when both cues are from the same physical dimension. 

The focus of the present series of experiments was the question of how listeners extract spec- 
tral relationships from the acoustic signal. That the auditory system computes some kind of 
running Fourier transform of the input has been an unquestioned underlying assumption. Given 
this assumption, there are two ways in which a listener may derive relali^iial spectral information: 
directly, through auditory transforms caused by peripheral adaptation, or indirectly, through a 
central comparison of the spectra of successive signal portions. These two processes are not mu- 
tually exclusive: Although central comparisons seem superfluous after peripheral processes have 
done the work, they may substitute for peripheral processes that are artificially disrupted, and 
they may also serve to compute higher-order patterns of change (e.g., the second derivative of 
the input). The effect of adaptation in nasal-consonant-vowel syllables would be to enhance the 
spectral change at vowel onset and beyond. According to the strong version of the adaptation 
hypothesis espoused by Kurowski and Blumstein (1984), the resulting direct auditory representa- 
tion of the spectral relationship would be the one and only place of articulation cue, making any 
further integration higher up in the system unnecessary. According to a weaker version of the hy- 
pothesis, the information obtained from the modified vowel onset is combined with cues obtained 
independently from preceding and following signal portions. The weaker version was considered 
more realistic because human listeners clearly have the ability to combine multiple sources of 
information and will make use of that ability whenever multiple sources are available. Peripheral 
auditory processes do not seem to have the integrative power to combine temporally distributed 
phonetic informatior On the contrary, it was argued that adaptation helps differentiate the 
signal into contrasting auditory components. 

From a review of the physiological and psychoacoustic literature it was concluded that short- 
term adaptation almost certainly does take place in the human auditory system during speech 
perception. The internal representation of the auditory signal from which phonetic information 
is derived, particularly at points following rapid spectral change, is therefore different from the 
one visible in a spectrogram or oscillogram. How* ver, does adaptation have any consequences for 
the intelligibility of speech? Summerfield et al. (1984) have pointed out some putative general 
advantages, such as improvement of the signal-noise ratio, but such advantages exist only relative 
to a hypothetical auditory system or speech recognition device in which no adaptation occurs. 
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The former may not exist, since adaptation may well be a general design feature of neural systems. 
As to the latter, it should be noted that adaptation can only enhance existing spectral change, not 
create it. Its perceptual effect is thus comparable to a lowering of the threshold for spectral change 
detection on an arbitrary scale, which a machine can easily emulate, and whose net effect is zero. 
Thus, there is perhaps no real "advantage" to be had from adaptation and spectral enhancement, 
except perhaps when the spectral change is right at the detection threshold. Similar conclusions 
have been drawn from studies of the effects of bandwidth narrowing and spectral enhancement on 
speech intelligibility in the hearing-impaired (Leek, Dorman, & Summerfield, 1987; Summerfield, 
Foster, & Tyler, 1985). 

It is still meaningful, however, to ask whether any perceptual disadvantage results from a 
reduction of adaptation, achieved by stimulus manipulations in the laboratory. The problem here 
is that such manipulations may have repercussions at all levels of the system, so it is not clear 
whether a performance decrement results specifically from the rbsence of peripheral spectral en- 
hancement or from interfe.ence with a more central process of spectral comparison or integration. 
This problem beset Experiments 1-3, in which auditory short-term adaptation was interfered with 
and identification performance decreased accordingly. Ha' 1 it not decreased at all, this would have 
been evidence that adaptation plays no role in the perception of prevocalic nasal consonants. As 
it was, the only indication that adaptation is perhaps unimportant was the rather small decrease 
in intelligibility consequent upon temporal separation of murmur and vowel portions (Experiment 
3). 

Experiment 4 added two other relevant findings. Reduction of murmur duration, which pre- 
sumably diminished the degree of adaptation, caused ^ performance decrement, but only at the 
very shortest duration. Although a ceiling effect may have imposed some limits, this finding is 
somewhat unfavorable to the adaptation hypothesis. The other finding was that mismatched 
murmurs did not lead to a performance decrement in [-a] and [-u] syllables, which confirmed 
a prediction of the adaptation hypothesis. A very different result was obtained with [-i] sylla- 
bles, however, which was more difficult (but not impossible) to reconcile with the adaptation 
hypothesis. All in all, the hypothesis emerged relatively unscathed from Experiments 1-4. 

Experiment 5 considered two alternative hypotheses, neither of which received much support. 
First, place of articulation perception was no more accurate for stimuli whose nasal manner 
was correctly perceived. Second, smoothing the abrupt stimulus onset caused by removal of 
the murmur engendered only a small improvement in identification performance — not enough to 
account for the high intelligibility of combined murmur and vowel onset cues. 

The adaptation hypothesis was still viable at this point. Experiments 6 and 7, however, 
yielded results that were clearly contrary to its predictions: A simulation of spectral enhance- 
ment at the onset of isolated vowel portions g nerally harmed, rather than improved, place of 
articulation identification. It may be argued that the situation was tc > artificial, and that spec- 
tral change information can be utilized only when the signal portion preceding the change (the 
murmur) is physically present. This objection, however, would be tantamount to saying that 
spectral change infoima.!on is obtained by a mere central computational process, rather than by 
peripheral adaptation. Or, in other words, it is the spectral change itself that is perceptually 
important, and not its auditory transformation through adaptation. 
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To compxite the relationship between two stimulus components, it seems necessary that 
relatively analog representations of these components %e available to the central nervous system. 
Once the murmur has been processed separately and encoded as a vector of categorical possibilities 
(Chistovich, 1985; Massaro & Oden, 1980), there is no way of recovering spectral relationships 
during processing of the vowel. This consideration points to auditory memory as a mediator in 
the central perceptual integration of stimulus components. That is, listeners may be able to hold 
on to a relatively faithful auditory representation of the nasal murmur even across a stretch of 
intervening noise or silence, and to compare that memory trace to the vowel onset spectrum. 
Moreover, even though the temporal separations employed in the present experiments are within 
the range of short-term auditory storage (Cowan, 1984), it seems likely that listeners rely on long- 
term auditory storage in making spectral comparisons, one reason being that the vowel would 
tend to "overwrite" the murmur in a sensory buffer (Cowan, 1984). Long-term auditory storage 
may last for a number of seconds, depending on the amount of detail to be retained. Even a 
life span of one second would be more than sufficient to account for the findings of the present 
study. This explanation is consistent with the very gradual decline in performance as a function 
of temporal separation. 

Why are the murmur and vowel components integrated at all? The auditory adaptation 
hypothesis advanced by Kurowski and Blumstein (1984) was an attempt to provide a low-level 
explanation: Integration is assumed to occur because of general principles of auditory process- 
ing, v.nd the speech perceiver merely needs to "pick up" the neatly parceled, unitary auditory 
properties to arrive at phonetic judgments. It seems, however, that auditory operations alone 
are insufficient to account for the perceptual integration of speech components. Indeed, it is not 
the signal portions themselves that are integrated (i.e., they remain audible a? separate auditory 
events; this is even more obvious in the case of fricative- vowel syllables, for example) but the 
information they convey. The information, to deserve that name, must inform the listenei about 
some event he or she has learned (or was born) to recognize. The rationale for information inte- 
gration thus must be sought in the listener's mental representations of common speech patterns, 
which in turn reflect the regular occurrences of acoustic (and articulatory) events in speech pro- 
duction {see also Repp, 1987a, 1987b). That is, the cues provided by the nasal murmur and by the 
following vowel are "integrated" because they, and their relationship (i.e., the pattern of pectral 
change reflecting articulatory movement), all contribute information about place of articulation 
of prevocalic consonants, and because listeners know this from long experience with speech as 
individuals and as members of the human species. In other words, the perceptual integration 
of the articulatory information conveyed by auditorily distinct speech components is a centrally 
guided, not a peripheral phenomenon. It reflects the listener's knowL Ige of the way speech is 
patterned, not principles governing the operation of the auditory system. 
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DIFFERENCE IN SECOND-FORMANT TRANSITIONS BETWEEN 
ASPIRATED AND UN ASPIRATED STOP CONSONANTS PRECED- 
ING [a]* 

Bruno H. Repp and Hwei-Bing Lin f 

Abstract. Perceptual experiments with synthetic speech have shown 
that the category boundary on an acoustic [pa]-(ta] (/ba/-/da/) con- 
tinuum (obtained by varying the onset frequencies of the second and 
third formants) is closer to the labial endpoint than the boundary on 
a [p h aj-[t h aj (/pa/-/ta/) continuum. Of several possible explanations, 
the most plausible seems to be that natural unaspirated and aspirated 
stops have different formant transitions. To supplement limited data 
on this point in the literature, we conducted an acoustic analysis of 
CV syllables produced by 10 male speakers of American English. The 
results show very clearly that the second formants of [p h aj and (t h aj 
start 100-200 Hz higher than those of [pa] and [ta] and reach compa- 
rable frequency values only at voicing onset. This difference, which 
is probably an acoustic consequence of subglottal coupling during as- 
piration, seems to be part of a listener's tacit knowledge of phonetic 
regularities and thus explains the perceptual boundary shift. It also 
needs to be taken into account in realistic speech synthesis. 

Introduction 

A highly reliable finding of perceptual studies using synthetic CV syllables forming place of 
articulation continua is that the category boundary on an unaspirated [pa]-[ta] (i.e., /ba/-/da ') 
continuum is closer to the labial endpoint than the corresponding boundary on an aspiratt i 
[p h a]-|t fc <i] (i.e., /pa/-/ta/) continuum (Alfonso & Daniloff, 1980; Massaro & Oden, 1980; Miller, 
1977; Oden & Massaro, 1978; Ohde & Stevens, 1983; Repp, 1978). In each of these studies, the 
stimuli in the two continua differed in the onset frequencies and transitions of the second and 
third formants (F 2 and F 3 ), whereas the difference between the two continua rested on voice 

* Language and Speech, in press. 
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onset time (VOT). In the case of aspirated stops, this meant a delay in voicing onset, presence 
of aspiration noise, and attenuation or complete suppression of the first formant {F\ ) during the 
aspirated interval. Formant transitions and VOT thus were varied in a strictly orthogonal fashion. 

No satisfactory explanation has been provided for the perceptual boundary shift, although 
several authors have speculated about its causes. If we include several additional possibilities that 
have occurred to us, no less than six different hypotheses resuk, which we shall discuss briefly to 
show that all but the one addressed by our study (No. 6) are unlikely candidates. 

(1) Feature processing interaction. Miller (1977) attributed the boundary shift to nonin- 
dependence in phonetic feature processing. (S^e also Haggard, 1970; Oden & Massaro, 1978; 
Sawusch & Pisoni, 1974; Smith, 1973.) At the fee, vhen feature detector theory was at the 
height of its popularity (see Remez, 1987), this hypothesis may have seemed to have some ex- 
planatory value. Basically, however, it is just a restatement of the finding, since it would be just 
as valid if the boundary shift went in the opposite direction. One testable pre^ction may be 
derived from this hypothesis, however: The shift in the place of articulation boundary should be 
a step function of VOT; that is, for a series of place of articulation continua differing by small in- 
crements in VOT, the perceptual boundary between labial and alveolar categories should change 
abruptly as VOT crosses the phonological voicing boundary but should remain relatively constant 
within voicing categories. In other words, the location of the place boundary should be a function 
of the perceived voicing category (the discrete response of a hypothetical "voicing detector"), 
not of VOT. In several experiments using appropriate stimulus arrays, Oden and Massaro (1978) 
and Massaro and Oden (1980) actually obtained results consistent with this prediction, although 
they nevertheless chose to emphasize the "relatively continuous" nature of the boundary change 
(Massaro & Oden, 1980, p. 1003). Repp (1978), on the other hand, obtained fairly continuous 
place boundary changes as a function of VOT; L jwever, VOT varied over a smaller range in his 
stimuli. In view of these inconclusive data, the feature processing interaction hypothesis cannot 
be dismissed, but it has little explanatory power in the context of contemporary theorizing, es- 
pecially since it is indifferent to the direction of the boundary shift. The same can be said about 
Oden and Massaro's (1978) feature integration model, which, even though it assumes independent 
processing of acoustic features, represents the phonetic feature interaction at the level of mental 
category prototypes. The model fits the data well, but it does not explain the direction of the 
effect. 

(2) Presence versus absence of F x . A second hypothesis is that the boundary shift originates 
in the auditory system: Some auditory interaction may make the F 2 and F3 transitions of aspirated 
stops appear to be lower in frequency than those of unaspirated stops, or may increase the relative 
perceptual salience of rising (labial) versus falling (alveolar) F 2 and F 3 transitions in aspirated 
as compared to unaspirated stops. The first formant could be involved in such an interaction. 
Because F x tends to be weak during natural aspiration, and because "F x cutback" is in fact 
an important cue for phonological voicelessness in initial English stop consonants (Libennan, 
Delattre, & Cooper, 1958), F\ has been attenuated as a matter of routine in the synthesis of 
aspirated stop consonants. There is also evidence in the literature tHt, in certain situations, 
the F\ transition, when it is present, may influence the perception of transitions in the higher 
formants: When a syllable is split between the ears, so that F x goes to oi.e ear and F 2 to the 
other ear, the discriminability of F 2 transitions is improved relative to a monaural or binaural 
condition (Danaher & Pickett, 1975; Rand, 1974). This improvement has been attributed to 
a release from peripheral "upward spread of masking" by F\. It seems reasonable that such 
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masking would have a greater effect on F 2 transitions that are close in frequency to F x and/or 
have a similar (rising) trajectory; thus it might decrease the relative salience of labial transitions 
in unaspirated stops, so attenuation of F x in aspirated stops would then result in a relative 
enhancement of these transitions, in accord with the observed perceptual boundary shift. In 
dichotic split-formant studies, Perl and Haggard (1974) and especially Perl (1975) did observe "a 
tendency for increased dichotic release from masking where initial F2 transitions tend towards 
tl e ^ame slope as accompanying F x transitions" (Perl, 1975, p. 36). Unfortunately, most other 
relevant studies failed to show such trends (Grunke & Pisoni, 1982; Hannley & Dorman, 1983; 
Nusbaum, Schwab, & Sawusch, 1983; Schwab, 1981; Turek, Dorman, Franks, & Summerfield, 
1980). In addition, informal observations by the first author suggest that synthetic syllables in 
which phonological voicelessness is cued solely by F\ cutback without accompanying aspiration 
noise (cf. Liberman et al., 1958) do not exhibit ' ny place boundary shift. The upward spread of 
masking hypothesis thus seems untenable. 

(3) Absence of rtlcase burst. A third possible explanation takes note of the fact that most 
studies have employed synthetic syllables without release bursts. Alveolar release bursts, because 
of their different spectral energy distribution, are more intense than labial release bursts, and 
aspirated stops tend to have stronger bursts than unaspirated stops (Zue, 1976). Burst amplitude 
(with spectral properties held constant) has been shown to be a secondary place of articulation 
cue: Listeners report more labial stop percepts when the amplitude is low than when it is high 
(Ohde & Stevens, 1983; Repp, 1984). Thus, if listeners expect a burst, its absence may lead to 
a general bias toward labial stop percepts, and this bias may be larger for stimuli that normally 
have stronger release bursts, viz., aspirated stops. In other words, the absence of a strong burst 
may make a stimulus sound even more labial then does the absence of a weak burst. However, 
Ohde and Stevens (1983) employed aspirated and unaspirated stimuli that included synthetic 
bursts and still found a large place boundary shift as a function of aspiration. Therefore, the 
"missing burst" hypothesis seems less promising now than it did a few years ago. Besides, it is 
almost impossible to test rigorously because of the difficulty of synthesizing release bursts that 
are both realistic and matched to the formant transitions on a place of articulation continuum. 

(4) VOT as a place cue. It is well known that alveolar stops have longer VOTs than labial 
stops, especially in their aspirated forms, although the difference is not very large and there is 
substantial overlap the VOT distributions (see, e.g., Lisker & Abramson, 1967; Ohde, 1984). 
Even so, it is conceivaole that the temporal aspect of VOT serves as a weak place cue in aspirated 
stops, such that listeners are somewhat more likely to perceive labials when VOT is relatively 
short, and alveolars when VOT is relatively long. If the VOTs of the synthetic [p^aj-ft^a] stimuli 
used in earlier studies were on the short side, the place boundary shift in favor of labial responses 
could be accounted for. The longest VOT used by Oden and Massaro (1978) and Massaro and 
Oden (1980) was 40 ms; that employed by Repp (1978) was 42 ms; Miller (1977) and Ohde and 
Stevens (1983) used a VOT of 50 ms for their aspirated stops; and Alfonso and Daniloff (1980) 
used a VOT of JO ms. The average VOT of [p h a] and [t h a] produced in isolation is about 70 
ms, with the VOT of [t h a] being some 10 ms longer than that of [p h a] (Lisker & Abramson, 
1967; present study). Thus all VOTs used in previous synthesis were indeed on the short (labial) 
side. It is noteworthy, however, that the largest place boundary shifts (about 145 Hz in terms 
of F 2 onset frequency) were obtained by Alfonso and Daniloff (1980), who used the longest VOT 
for their aspirated continuum. This observation, together with the gi<;at variability of VOTs in 
natural speech, makes it unlikely that VOT could be responsible for the boundary shift. 
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(5) Aspiration noise spectrum and/or intensity as a place cue. Massaro and Oden (1980) 
proposed that the aspiration noise itself may provide a cue for labial place of articulation (see 
also Ohde & Stevens, 1983). At first glance, this hypothesis seems to ignore the fact that in 
synthetic stimuli (as in natural speech) the aperiodic source passes through the same F2 and 
F3 filters as the periodic source, leading to similar spectral shapes above F\. It is possible, 
however, that differences in the spectral slope and/or amplitude of periodic and aperiodic source 
spectra somehow contribute to the perceptual boundary shift, especially if they deviate from 
what is observed in natural speech. Unfortunately, these parameters are commonly omitted from 
descriptions of synthetic stimuli, and information about their magnitudes in natural speech is 
also hard to come by. Massaro and Oden did find that labial responses increased further when 
aspiration noise intensity was increased; however, since labial responses increased with VOT (up 
to 40 ms, the longest value used) in their study, the result may reflect the fact that stimuli with 
higher aspiration levels are phonetically equivalent to stimuli with longer VOTs, perhaps due to a 
time-intensity reciprocity in auditory perception (Darwin & Seton, 1983; Repp, 1979). Certainly 
there is no reason to believe that natural labial stops are characterized by more intense aspiration 
than alveolar stops. In summary, while the global acoustic characteristics of natural aspiration 
bear closer examination, it seems unlikely that they vary with place of articulation and, hence, 
that they could function as secondary place of articulation cues. 

(6) Different formant transitions in unaspirated and aspirated stops. The sixth and final 
hypothesis is that the formant transitions are different in aspirated and unaspirated stops, so 
that listeners apply different criteria for place decisions along a formant transition continuum 
depending on whether aspiration is present or absent. Despite a long tradition of synthesizing 
unaspirated and aspirated stops with identical formant transitions for use in perceptual exper- 
iments (which may derive, in part, from the "locus" theory of Delattre, Liberman, & Cooper, 
1955), there is in fact some limited support for thi hypothesis in the acoustic phonetics liter- 
ature. Fant (1973) reports that /p/ (i.e., [p k ]) tends to have higher F2 onsets than /b/ (i.e., 
[p]) before back vowels such as /a/. However, his very limited data derive from a single speaker 
of Swedish, and some of the formant frequencies reported seem unusually low. Similar data for 
English collected by Lehiste and Peterson (1961) and replotted by Fant (1973) are suggestive at 
best. More convincing are Gay's (1978) spectrographic measurements of F2 onset frequencies in 
syllables produced by three male American speakers: F2 onset in /pap/ and /pup/ was about 
180 Hz higher than in /bap/ and /bup/; however, it was about 125 Hz lower in /pip/ than in 

/b' P /. 

Gay mentions three possible causes of the difference in formant transitions preceding back 
vuwels: (a) The coarticulatory hypothtus: Fant (1973) speculated that /b/ is coarticulated more 
strongly with a following back vowel (i.e., the tongue is more nearly in position for the vowel before 
the release of the stop closure) than is /p/, while no such difference exists between /d/ and /t/. 
(b) The release timing hypothesis: As the articulators berin to move towards the vowel, the release 
of aspirated stops may occur earlier in time than that of unaspirated stops, so that energy begins 
while the articulators are still farther away from the vowel target (Ohman, 1965; see Fant, 1973, 
p. 118). The acoustic consequences are similar to those predict rd by the coarticulatory hypothesis, 
but it should be possible to overlay the formant trajectories of aspirated and unaspirated stops 
after correcting for the time shift (cf. Fant, 1973). (c) The subglottal coupling hypothesis: The 
higher F2 onsets for aspirated stops may arise from the open glottis during aspiration. This 
acoustic explanation appears very plausible in view of research by Lehiste (1964, citcH in Lehiste, 
1970) and Kallail and Emanuel (1984a, 1984b) on whispered vowels, in which especially F x but 
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also F 2 and F 3 tend to be higher than in phonated vowels, with the possible exception of high front 
vowels. Indeed, glottal opening is likely to be wider at the beginning of aspiration than during 
whisper (Catford, 1977). Fant, Ishizaka, Lindqvist, and Sundberg (1972) have modeled these 
effects of subglottal coupling, which may include additional subglottal formants in the aspiration 
spectrum, especially right after the release. 

A clear demonstration of higher formant frequencies (especially of F2) in aspirated than 
in unaspirated stop consonants preceding [a] would be of value for three reasons: First, the 
relevant data in the literature are incomplete and not easy to find; in particular, there have 
been no comparisons of the complete formant transitions in unaspirated and aspirated stops for 
both labial and alveolar places of articulation. Second, such data would provide an important 
guideline for realistic speech synthesis. Third, they would provide a sufficient explanation of the 
perceptual boundary shift and provide yet another illustration that listeners engaging in linguistic 
classification rely on tacit knowledge of a wealth of phonetic detail (see Repp, 1987). 

Only the syllables [pa], [ta], [p*a], [t h a} } were considered in this study, because they were 
the endpoints of the continua u&od in previous perceptual studies. Nevertheless, it was possible 
even in this limited context to address the three hypotheses about the origin of differences in 
formant frequencies between unaspirated and aspirated stops, if any were found: (a) If Fant's 
coarticulatory hypothesis is correct, the difference should be more pronounced for labial than for 
alveolar stops, since the tongue body is less free to anticipate the shape of the following vowel 
during alveolar closure. Also, the time course of the labial F 2 transition should be independent of 
VOT in aspirated tokens; that is, it should be a function of the movements of the upper articulators 
only, (b) If Ohman's release timing hypothesis is correct, the results should be similar, but in 
addition it should be possible to superimpose the average formant tracks of unaspirated and 
aspirated token* by shifting them in time relative to each other. Thus, a finding of rising F 2 
transitions for [pa] but falling F 2 transitions for [p h a] would be incompatible with the release 
timing hypothesis, but not necessarily with Fant's coarticulatory hypothesis, (c) If the subglottal 
coupling hypothesis is correct, the F 2 difference between aspirated and unaspirated stops should 
be present for both labial and alveolar stops and should disappear with voicing onset in aspirated 
tokens. Of course, these hypotheses are not mutually exclusive, and more than one explanation 
may be supported by the data. 

In addition to providing measurements of F 2 trajectories to address these principal hypothe- 
ses, the present study also yielded data on Fi and F z frequencies, and on the spectral tilt and 
relative amplitude of aspiration — information 4V >at is difficult to locate in the literature but is 
useful for speech synthesis. 



Ten male speakers of American English produced the syllables [pa], [ta], [p h a], [t h a], five 
times in random order, reading from a list of randomized syllables spelled BA, DA, PA, TA. They 
were recorded in a sound-insulated booth using a Sennheiser microphone and an Otari MX5050 
tape recorder located in an adjacent booth. The mouth-to-microphone distance was about 20 
inches. All 200 utterances were low-pass filtered at 4.9 kHz and digitized at a sampling rate of 10 
kHz with high-frequency pre-emphasis. Each file was edited to eliminate silence or (rare) voicing 
preceding the release. A 14-coefficient LPC analysis was then conducted using a 20-ms Hamming 
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window advancing in 10-ms steps, and formant frequencies were estimated using the root solving 
method (ILS package, Version 4.0, distributed by Signal Technology, Inc.). 

The resulting arrays of formant frequencies as a function of time were cleaned up by hand 
to eliminate occasional spurious peaks, to make sure that all frequencies were aligned with the 
appropriate formants, and to deal with the problem of missing values. One speaker was excluded 
from further analysis because of insufficient F2 data for labial stops. For the other speakers, 
missing formant frequencies were filled in by interpolating between preceding and following values 
or, if they occurred at the onset, by extending the first existing value backward in time. Missing 
frequencies were especially common in the initial time frames; this was not surprising, since 
release bursts often do not have a clear formant structure. Thirty-eight percent of the F2 data 
were missing in frame 1, 19% in frame 2, and from 12% to 3% in frames 3-10. Eighty-six percent 
of all missing values were in aspirated tokens; of these, 62 percent were in [p h a] tpkens and 38 
percent in [t h a] tokens. For F3, the percentages of filled-in values were 28% in frame 1 and between 
7 and 15% in frames 2-10. While interpolation of missing F2 and F3 values in later frames should 
noi have distorted the analysis results in any way, the filling in of missing initial values by le v el 
extension of later values (a conservative procedure) may have resulted in an underestimation of 
existing differences in formant frequencies between unaspirated and aspirated tokens at onset. 
F\ , of course, was generally absent during aspiration and was also spurious in unaspirated tokens 
for two speakers. To compare F\ in unaspirated labials and alveolars, the F\ data of the eight 
speakers with fairly complete values were analyzed after filling in missing values (36% in frame 
1, 2-6% in frames 2-10). 

Voice onset times of aspirated tokens were measured in a waveform display by locating the 
onset of the first glottal pulse. In addition, to corroborate the LPC analysis results and to 
examine the spectral and amplitude characteristics of aspiration, FFT spectra of all utterances 
were obtained from 20-ms Hamming windows centered 10, 30, and 50 ms after the release. To 
reduce random level fluctuations, the spectra were averaged over the five repetitions of each 
syllable by each speaker. From these average spectra we picked F2 peaks by eye wherever possible, 
interpolating if there were two closely adjacent peaks in the relevant region. This yielded complete 
estimates of F2 frequencies for all 10 speakers at the three time points for [pa] and [ta]; for [t A a], 
only 2 data points (7% of the data) were missing; for [p^a], however, peaks could not be located 
in 10 instances (33% of the data). As with the LPC data, the missing values were interpolated 
or extrapolated from the existing ones, so as to have a complete matrix for calculation of means 
and for statistical analysis. 

Results and Discussion 

F2 Transitions 

Because of considerable differences in utterance durations for different speakers, only the 
first 110 ms of each toke*i (i.e., 10 over xpping 20-ms analysis time frames) were considered. The 
cleaned-up arrays of formant values were averaged across all tokens of all speakers to obtain an 
overall picture of the differences in formant transitions. These average F2 transitions are plotted 
as the connected points in Figure 1. It is evident that both aspirated syllable types had higher F 2 
onsets than their unaspirated counterparts, and that this difference gradually decreased over the 
first 70 ms or so. Right after the release the differ* ice was larger for labials than for alveolars, 
but after 30 ms it seemed independent of place of articulation. In addition, it may be noted that 
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Figure 1. The connected points show the average second formant (F2) transitions over the first 100 ms of [pa], 
[p^a], [ta], and [t^a], as determined by LPC analysis. Each transition represents the average of 45 utterances 
(5 tokens from each of 9 speakers). The unconnected points represent average F2 frequency estimates from FFT 
analysis of all 10 speakers' productions. Formant frequencies are plotted at the centers of the 20 ms time windows. 



Fi was higher for alveolar than labial tokens well beyond the first 100 ms. Formant transitions 
thus may be a good deal longer than the (approximately) 50 ms often cited in the literature and 
employed in speech synthesis. 

A repeated-measures analysis of variance was conducted on the token averages wi'h place 
of articulation, aspiration, and time as factors. All main effects and interactions were significant 
at p = .0005 or less, except for the place by aspiration interaction, which was nonsignificant. 
The overall magnitude of the aspiration effect was thus similar for labial and alveolar stops. The 
triple interaction, F(9,72) = 5.71,p < .0001, however, confirms that the aspiration effect was 
smaller for alveolar than for labial stops immediately after the release. Separate analyses of labial 
and alveolar tokens showed that the unaspirated/aspirated difference was significant for both 
places of articulation— labial: F(l,8) = 32.67, p = .0004; alveolar: F(l,8) = 15.03, p = .0047. 
In addition, their decrease as a function of time was reflected in highly significant interactions 
between aspiration and time— labial: F(9,72) = 29.69, p < .0001; alveolar: F(9,72) = 11.51, p < 
.0001. This pattern was shown by all individual speakers. 

Similar analyses were conducted on the F2 frequency estimates derived from FFT spectra; the 
averages are plotted ;is the uacc lected points in Figure 1. As pointed out in the Methods section, 
the data for [p h a] were somewhat unreliable, which explains the major discrepancy between the 
LPC and FFT frequency estimates for that syllable. For the othei syllables, there was reasonable 
agreement between the two sets of data, although FFT estimates seemed to be systematically 
lower than LPC estimates for unaspirated stops. Absolute differences aside, the FFT data clearly 
corroborate the finding of higher F2 frequencies during aspiration. In the overall analysis of 
variance, all effects except the place by aspiration interaction were significant at p — .01 or less. 
Tested separately, the main effect of aspiration was significant for both labial, F(l,9) = 7.72, p — 
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.0214, and alveolar stops, F(l,9) = 34.09, p = .0002; for the latter there was also a significant 
change of the effect over time, F(2,18) = 8.27, p = .0028. 

The magritude of the difference for labials at release is in good agreement with Gay's (1978) 
data, as are the absolute LPC-derived formant frequer ies. The magnitude of F 2 difference 
between phonated and whispered [a] reported by Kalian and Emanuel (1984b) is also similar. 
This last observation, together with the finding of similar differences for labials and alveolars, 
except right after the release, suggests that the explanation is to be found in the open glottis 
during aspiration. 

Of the two alternative explanations, Ohman's release timing hypothesis seems to be incon- 
sistent with the present data. Even granting possible distortions due to averaging over tokens 
representing different vocal tract sizes and speaking rates, there is no way the transitions for 
unaspirated and aspirated tokens could be time-shifted to coincide in Figure 1. This is especially 
true in the case of [pa], which has a barely rising F2 transition, and [p*o], which has a clearly 
falling one. Thus, this hypothesis can be dismissed. Fant's coarticulation hypothesis predicted a 
smaller difference for alveolars than for labials, which was found immediately after the release but 
not some tens of milliseconds later. It is possible that, as the tongue is freed from the constraint 
of the alveolar closure, it rapidly adjusts to the following vowel shape, and more so in [ta] than in 
[t h a], (Alternatively, the presence of a frication source at the alveolar constriction may obscure 
any existing F2 differences during alveolar release bursts.) The coarticulatory hypothesis thus is 
not incompatible with the data in Figure 1, even though Fant himself commented only on labial 
stops. 

Another prediction of Fant's hypothesis, however, is that the time course of the F2 differ- 
ence should be independent of when voicing starts in aspirated tokens. The subglottal coupling 
hypothesis, on the other hand, predicts that the difference should end at voicing onset. The 
F2 trajectories for [p h a] and [t h a] shown in Figure 1 were obtaii. A by averaging over aspirated 
tokens with VOTs ranging from 40 to 126 ms, with an average of 70 ms (66 ms for labials, 73 ms 
for alveolars), which resulted in considerable smearing in the time domain. An alternative way to 
analyze the data is to line up all aspirated tokens at voice onset rather than at the release. Figure 
2 shows the average F2 frequencies in the vicinity of voice onset after lining up aspirated tokens in 
this way, with unaspirated tokens lined up correspondingly by yoking them to aspirated tokens of 
the So,me speaker and shifting them by the same amount along the time axis. It can be seen that 
the F2 difference indeed disappears at voice onset for alveolar stops, and nearly so for labial stops. 
In analyses of variance on the five time frames following voice onset, no significant F2 differences 
were obtained for either labials or alveolars. An additional analysis including rank-ordered VOT 
as a factor was conducted to determine whether F 2 onset frequency in aspirated stops increased 
with VOT. The result was negative. 

Had the differences in F 2 trajectories extended beyond voicing onset or had they ended 
much sooner, the coarticulatory hypothesis might have been favored over the subglottal coupling 
hypothesis. On the other haiH a positive correlation between F 2 onset frequency and VOT in 
aspirated stops would have supported the latter hypothesis. As it is, the data are still consistent 
with both hypotheses, though the subglottal coupling h) . othesis would seem to provide a more 
parsimonious account: The acoustic consequences of subglottal coupling are necessary effects, 
while differences in the position of the upper articulators are not (as long as no direct observations 
of articulation show they do exist). The gradual decline in the F 2 difference prior to voice onset 
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Figure 2, Average second formant (F 2 ) frequencies in the vicinity of voicing onset fc [p h a] and [t h a] tokens 
lined up at voicing onset, and for yoked [pa] and [ta] tokens lined up at corresponding time points. 

probably reflects the gra !ual narrowing of the glottal opening before voicing starts (see, e.g., 
Hirose, 1977; Kagaya, 1974). The smaller difference between F 2 of [ta] and [t h a] right after release 
may be due to broadband frication noise generated while the constriction is narrow. Subglottal 
upling thus provides a sufficient explanation of the observed differences in F 2 trajectories. 



Fx and F 3 Transitions 

We also examined differences in F 3 transitions in the same manner. However, there were 
no significant F3 differences as a function of aspiration in either labials or alveolars, whether 
aligned at release or at voice onset. Kallail and Emanuel (1984b), too, found only a very small 
(presumably nonsignificant) F 3 difference between voiced and whimpered [a]. 

F ly on the other hand, is strongly affected by a change in source, being about 250 Hz 
higher in whispered than in phonated male [a] (Kallail & Emanuel, 1984b), but its increased 
bandwidth makes frequency measurements difficult, and we did not attempt to determine F x 
frequencies during aspiration. We did compare F x transitions in unaspirated [pa] and [ta] for 
eight subjects (for two subjects the LPC analysis did not yield reliable F x estimates, but the 
subject excluded from the F 2 analysis was included here) and found a significant difference, 
F(l,7) = 21.87,j> = .0023, which decreased over time, F(9,63) = 17.33, p < .0001. All subjects 
showed higher Fj onsets in [pa] than in [ta]; the averages were 669 and 589 Hz, respectively. 
After 100 ms, this 80 Hz difference had dwindled to 28 Hz. 
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Figure 3. Average Fourier (FFT) spectra of unaspirated and aspirated stops at three points in time, calculated 
using Hamming windows centered 10, 30, and 50 ms after the release. Each spectrum represents the average of 50 
utterances (5 tokens from each of 10 speakers). The upper function in each panel represents the unaspirated stops, 
and the lower function the aspirated ones. All spectra include high-frequency pre-emphasis of approximately 6 
dB/octave above 1 kHz, and less below. 

Aspiration Nois*'- Spectral Tilt and Relative Amplitude 

Finally, we compared spectral cross-sections of aspirated and unaspirated tokens at three 
points in time (10, 30, and 50 ms after the release). Figure 3 shows these spectra averaged 
over all tokens of all speakers. Although the formant peaks in these grand average spectra are 
somewhat flattened because of between-speaker variability in absolute formant frequencies, the 
general pattern is fairly representative of individual speakers' utterances. Three aspects deserve 
attention. First, the upward shift in F2 during aspiration is evident, except in the first time 
frame for alveolar stops, where the spectrum reflects the [sj-like frication noise that is part of 
the release burst (cf. Figure 1). The F2 peak is rather broad for [p*a], which was also true for 
most individual speakers' spectra. On its lower skirt, a raised and attenuated F\ (see Kallail & 
Emanuel, 1984a, 1984b) may have contributed to this prominence. On the upper skirt , additional 
subglottal resonances may have occurred (Fant, Ishizaka, Lindqvist, & Sundberg, 1972), although 
we did not observe any distinct peaks in individual spectra that, could be identified with sucl 
resonances. 

Second, it is obvious that the spectrum during aspiration has a different tilt from that during 
voicing. Acoustic theory predicts a -12 dB/octave spectral slope when the source is voiced, and 
a -6 dB/octave slope when the source is noise from the glottis (Fant, 1960; Hillman, Oesterle, 
& Feth, 1983). Although the spectra in Figure 3 are plotted on a linear frequency scale and 
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include high-frequency pre-emphasis of approximately 6 dB/octave above 1 kHz, it is clear that 
they roughly conform to the predictions. If a correction for pre-emphasis were applied, all spectra 
would have a downward tilt, the voiced spectra more go than the aspirated ones, as predicted. 
Labial and alveolar tokens do not seem to differ in spectral tilt. 

Third, the relative amplitude of aspiration should be noted. It is especially difficult to 
locate information in the literature on this parameter, which is often a source of frustration in 
synthesizing aspirated stops. As can be seen, the levels of voiced and aspirated spectra converge 
between 3.5 and 4 kHz but diverge increasingly at lower frequencies. The differences observed 
are somewhat larger than predicted on the basis of a 6 dB/octave slope difference; in fact, they 
are more in accord with a linear 6 dB/kHz slope difference (cf. Hillman et al., 1983): On the 
average, the levels of voiced and aspirated F 3 peaks differed by 11 dB, and those of F 2 peaks by 
18 dB, with very similar differences for labials and alveolars. Level differences were even larger 
in the Fj region, due to the reduction of F x during aspiration. There was enormous individual 
variability, however, in the absolute magnitude of these differences: F z level differences ranged 
from 4 to 17 dB across speakers, and F 2 level differences from 7 to 27 dB, probably reflecting 
individual differences in source spectra. 

Summary and Conclusions 

We have shown that aspirated labial and alveolar stop consonants preceding [a] have F2 
transitions that start at significantly higher frequencies than those of unaspirated cognates. The 
difference gets smaller over time and disappears with voice onset, which suggests that it is due to 
upward shifts in vocal tract resonances caused by the open (and gradually closing) glottis during 
aspiration. These data repicate and extend earlier observations by others, and they provide a 
valuable guideline for improved speech synthesis. Fant (1973, p., 131) recommended long ago that 
a w r 'nor correction for the effect of glottal opening on the F-pattern" be added in synthesis, and 
noted that "an open glottis increases F 2 and F z by about 50-100 Hz." Our data suggest that, in 
the context of [a], the effect is about twice as large but restricted to F 2 . It is astonishing th, 4 this 
difference has gone relatively unnoticed for so long, and that it has been completely ignored in 
the long series of studies employing synthetic stop-consonant-vowel (mostly [-a] or [-*]) syllables 
and VOT continua over the last 20 years. 

For reasons that are not well understood, the raising of F 2 during aspiration seems to be 
absent for high front vowels such as [i] (Gay, 1978; Kallail & Emanuel, 1984a, 1984b). It might be 
predicted, then, that th< perceptual category boundaries on [pi]-[ti] and [p fc i]-ft fc i] continua should 
be similar. Unfortunately, this interesting prediction is not testable because h 2 transition.*: do not 
reliably differentiate labial and alveolar stops in [i] context (see, e.g., Kewley-Port, 1982). Another 
prediction more amenable to test is that, unless there is differential coarticulation (Fant, 1973), 
the F 2 transitions of whispered [pa] and [ta] (i.e., intended /ba/ and /da/) should not differ ix< ,n 
those of [p h a] and [t fc a], and the category boundary on a noise-excited synthetic labial-alveolar 
continuum should likewise be similar to that on a [p' 4 a]-[t*a] continuum. 

The difference in F 2 onset frequencies between aspirated and unaspirated s ops preceding 
[a] provides a sufficient explanation of the reliable perceptual shift in the labial-alveolar category 
boundary on a formant transition continuum as a function of VOT. The magnitude of perceptual 
boundary shifts reported in the literature (expressed in terms of F 2 onset frequency, about 100 
Hz on the average) matches the magnitude of the average acoustic difference in F 2 transitions. If 
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aspiration is introduced in a synthetic syllable without changing the F 2 transition, as has been 
the custom, listeners expect the transition to be higher and therefore perceive the stimulus as 
relatively more labial. The effect of glottal opening on vocal tract resonances thus seems to be 
represented in listeners' tacit knowledge of phonetic iegularities. Even though the boundary shift 
is essentially an artifact of primitive synthesis methods, it serves to remind us of the rich store of 
phonetic knowledge that listeners refer to in speech classification. Identification of speech depends 
as much on what listeners know about the sounds and gestures of their language as on what is in 
the acoustic signal (cf. Repp, 1987). 
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SENSITIVITY TO INFLECTIONAL MORPHOLOGY IN AGRAM- 
MATISM: INVESTIGATION OF A HIGHLY-INFLECTED 
LANGUAGE* 

K. Lukatela,f S. Crain,f and D. ShankwePerf 

Abstract. We present the results of a study with six Serbo-Croatian 
s A caking agrarnmatic patients on a test of inflectional morphology in 
which subjects judged whether spoken sentences were grammatical or 
ungrammatical. Sensitivity to two kinds of syntactic features was in- 
vestigated in these aphasic patients: 1) subcategorization rules for 
transitive verbs (which mast be followed by a noun in the accusative 
ca.e; intransitive verbs can be followed by nouns in other noun cases); 
2) sensitivity to the inflectional morphology marking noun case. The 
test items consisted of three -word sentences (noun-verb-noun) in which 
verb transitivity and appropriateness of the case inflection of the fol- 
lowing noun were manipulated. Results of the gramme ticality judg- 
ment task show that both syntactic properties are preserved in these 
patients. 

INTRODUCTION 

Recent research on Broca-type aphasia has suggested that syntactic deficits in speech pro- 
duction have parallels in speech comprehension. It has been argued that Broca patients with 
agrarnmatic output not only tend to omit many grammatical words and grammatical morphemes 
in their productions, but also fail to process these words properly in comprehension, although 
special tests were required to bring these problems to light. An important claim in this regard was 
made by Bradley, Garrett, and Zurif (1980), who offered a unified account of Broca-type apha- 
sia encompassing both production and comprehension, based on results obtained using lexical 
decision and pi ture verification tasks. 

* Brain and Language , in press. 

f Also University of Connecticut 
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However, using a different experimental task, other researchers have found retained capacities 
of agranimatic aphasics to apprehend syntactic structures and to process the same closed-class 
items that are so often absent in their speech. Retained ability of English-speaking agramniatics 
to detect a variety of syntactic anomalies was uncovered by Linebarger, Schwartz, and Saffran 
(1983) using a grammatically judgment task. They found that so-called agrammatics perform at 
much better than chance level in judging the acceptability of many syntactic structures, including 
ones that hinge on the availability of closed-class items (e.g., auxiliaries). Such evidence is clearly 
incompatible with any hypothesis that tries to explain agrammatism as loss of tacit knowledge 
necessary to compute syntactic structure. 

Subsequent work by Crain, Shankweiler, and Tuller (1984) supported and extended the find- 
ing of preserved receptive processes in the context of severely limited production. Their agram- 
matic subjects showed retained ability to detect anomalies involving prepositions, determiners, 
particles, and auxiliary verbs — closed-class items that are often missing in the productions of 
Broca-type aphasics. Moreover, the agrammatic subjects in this study were pressed to make 
judgments of grammaticality "on-line," a maneuver that forestalls the possibility that they might 
be adopting procedures for judging grammaticality that do not appropriately reflect their normal 
syntactic parsing routines. 

The present study pursues the issue of receptive capabilities in agrammatism in patients who 
speak a language quite unlike English. If it is correct to characterize agrammatism in linguistic 
terms, then losses in language function that follow lesions in specific language zones will occur 
across all languages, making agrammatism a universal phenomenon. Still, the particular effects 
of lesions may vary with structured differences among languages, because langtages sometimes 
employ different means to achieve the same grammatical ends. Thus the same neurological deficit 
could produce different patterns of symptoms in speakers of different languages. Naturally, the 
variation in expression of aphasia caused by cross-language differences cannot be without limit 
if grammatical devices are expressions of a Universal Grammar and subject to its constraints 
(Chomsky, 1981). 

These considerations underscore the importance of cross-language studies of aphasia in eval- 
uating theoretical hypotheses about the source of agrammatism. Among the criteria of theoretical 
adequacy is the requirement that we should be able to predict and account for the manifestations 
of agrammatism in different languages. 

A recent account of agrammatism proposed by Grodzinsky (1984) gives due weight to such 
cross-linguistic considerations, and, indeed, makes detailed predictions about the manifestations 
of agrammatism in several languages. On his account, different languages will have associated 
with them different patterns of impairment, with the patterns reflecting a common principle: 
misselection of closed-class words (i.e., the class that includes articles, auxiliary verbs, particles, 
and prepositions) within the same syntactic category. Other explicit predictions are made, in- 
cluding the prediction (i) that closed-class items will not be missing entirely in all languages, and 
(ii) distinctions between closed class items belonging to different switartic categories ^hj'ild be 
preserved despite the loss of sensitivity to distinctions within a cat gory. 

As to the first point, Grodzinsky 's heory contrasts free-standing grammatical morphemes, 
which are often missing entirely in the productions of English-speaking agrammatics, with bound 
morphemes (grammatical affixes). According to the theory, inflectional affixes will be neglected 
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by agrammatics only when they are unessential to the "well-formedness" of the lexical item— if, 
in other words, the lexical item without the affix maintains its status as a word. The second 
prediction of the theory, that between-class sensitivity is preserved in agrammatism, follows from 
the proposal that what is lost in agrammatism is the lexical content normally present at the 
terminal nodes of closed-class categories. Information about "part-of-speech" is available, but the 
particular words are not. 

The present study is designed to investigate Grodzinsky's hypotheses, taking advantage of a 
cross-language difference in use of closed-class morphology. Languages that have few word order 
constraints are usually also highly inflected; they make heavy use of bound morphemes. On the 
other hand, fixed word-order languages commonly use word order to mark the same grammatical 
phenomena that are handled by inflectional morphology in nonconfigurational languages. 

Pursuing this distinction, we note that in English the order of constituents is a fundamental 
device for indicating both semantic and syntactic relationships. German and Serbo-Croatian, in 
contrast to English, are relatively free-word-order languages.. In Serbo-Croatian, morphological 
inflection is used to express grammatical relations that are expressed by word order in English. 
Unlike English, where case is conveyed either by word order (or by a free standing preposition or 
pronoun), Serbo-Croatian marks case relations by noun inflections, and imposes comparatively 
few restrictions on word order. In order to construct a grammatically correct structure, words 
have to match in gender, number, person, and noun case. This is accomplished by adding an 
appropriate suffix, an inflectional morpheme, to the word stem. The fact that the morphology of 
closed-class items plays such an important role in Serbo-Croatian makes it an ideal language to 
contrast with English, in testing detailed theoretical claims like Grodzinsky's. 

Previous research has shown that both German-speaking and Serbo-Croatian-speaking 
agrammatics show some degree of sensitivity to case inflection even when the test sentence de- 
parts from standard word order (Priederici, 1982; Heeschen, 1980; Smith & Bates, 1985; Smith & 
Mimica, 1984). Heeschen found that German Broca's aphasics were in error 18% of the time in 
matching semantically reversible sentences to pictures when standard word order was presented, 
and in error 27% of the time when standard word order was violated. In an object-manipulation 
study with Serbo-Croat aphasics, Smith and Mimica showed that agrammatics are differentially 
sensitive to three types of cues: closed-class morphology, semantic constraints, and word order. 
Agrammatics were impaired relative to normals when forced to rely on case inflection cues alone. 
However, it was found ihat sentence understanding in agrammatic users of Serbo-Croatian was 
facilitated by a convergence of cues that, in combination, often led to successful processing of sen- 
tences. The available data on agrammatism in different languages neither confirm nor disconfirm 
Grodzinsky's hypothesis that within-class sensitivity to bound morphemes should be impaired in 
agrammatics who speak a language with relatively free word order. Some impairment is evident, 
but in the use of convergent cues to assign noun case, there is also evidence of some sparing of 
function that dots not accord well with Grodzinsky's account. 

Problems associated with the choice of task to assess grammatical competence merit com- 
ment. The findings we have just discussed indicate that aphasic subjects perform better on some 
tasks than others. Tasks that minimize extraneous demands, e.g., the grammaticality judgment 
task, ! : aye proven more successful in uncovering retained syntactic ability than tasks like picture 
verification and object manipulation. The latter have been found to underestimate the extent 
of agrammatics' competence. Consequently, in much previous research, failures of agrammatics 
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to use closed-class morphological items in analysis of sentences may have rejected a processing 
limitation, and not a structural deficit per se. For these reasons it seems to us that past research 
does not provide the data needed for a definitive test of Grodzinsky's specific claims about the 
linguistic source of agrammatic comprehension errors. 

The present study focuses specifically on the processing of bound morphemes marking noun 
case by Yugoslavian agrammatics who were native speakers of Serbo-Croatian. We chose to use 
elicited grammaticality judgments as the task in order to avoid introducing extraneous processing 
factors that would otherwise be confounded with syntactic parsing in object-manipulation and 
picture-verification tasks. The Serbo-Croatian-speaking agrammatic aphasics were tested for 
retained sensitivity to noun inflections in the context of the contrast between transitive and 
intransitive verbs. This maneuver allowed us to test Grodzinsky's hypothesis that distinctions 
within the same closed-class category should be lost in agrammatism. 

In the Serbo-Croatian language, subcategorization is related not only to the meaning but 
also to the syntactic structure of a noun. Both transitive and intransitive verbs can be directly 
followed by a noun phrase. If the verb is transitive, however, it must be followed by a noun in 
the accusative case. This feature of Serbo-Croatian offers the opportunity to create transitive vs. 
intransitive sentences that are minimal pairs. Sentences of both types can be constructed so as 
to be identical except for the terminal noun suffix. This suffix alone may differentiate a transitive 
from an intransitive sentence. In English it is impossible to create such minimal pairs because an 
intransitive verb in English cannot be directly followed by a noun phrase, whereas a transitive verb 
must be. (These differences between the two languages are diagrammed schematically in Figure 
1.) But of course these differences in subcategorization in English, but not in Serbo-Croatian, 
necessitate differences in prosody and length. 
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Figure 1. The diagram compares the form of the verb phrase for transitive, and intransitive verbs in English, and 
Serbo-Croatian. Note that in Serbo-Croatian, unlike in English, a noun may follow the verb directly fcr either a 
transitive ot an intransiti /e verb. Intransitivity is marked by some other case than the nominative or accusative. 
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Some evidence has already been obtained, using the grammaticality judgment task, that 
English-speaking agrammatics are sensitive to the kind of strict subcategorization information 
that is conveyed by transitive vs. intransitive verbs. However, if Grodzinsky's hypothesis is 
correct, then Serbo-Croat agrammatics, unlike English-speaking agrammatics, should not be sen- 
sitive *o this subcategorization property of verbs. This is expected because, in Serbo-Croatian, 
transitivity is captured by affixation and not by word order. Accordingly, Serbo-Croat aphasics 
should be unable to tell whether there is agreement between a specific verb and the case inflec- 
tion of a following noun. This is just the kind of cross-language difference that is expected, on 
Grodzinsky's count, if agrammatism has a linguistic basis. 

The ability of Serbo-Croat agrammatics to use subcategorization information was tested by 
manipulating the cas^ endings of nouns tbat follow either transitive or intransitive verbs. We 
wanted to discover whether the subcategorization facts associated with transitive verbs are more 
accessible to agramma.ic aphasics than those associated with intransitive verbs for grammatical- 
ity decisions that turn on noun case. Clearly, Grodzinsky's hypothesis would predict that the 
two classes of verbs stiould be treated in the same way, so that performance on judgments of 
grammaticality would i>e roughly at chance for each. This question was put to an empirical test 
in our experiment. 

To summarize, a much debated issue in neurolinguistics is whether the syntactic deficits of 
agrammatics in speech production have parallels in speech comprehension. The hypothesis implies 
that there is some central syntactic processing component that is impaired in agrammatism, and 
that it is a cause of both comprehension and production difficulties. Our research addresses 
this issue by focusing on receptive processes in agrammatism from a cross-language point of 
view. The study had two purposes: first, to identify universal, cross-language characteristics of 
agrammatism and second, to exploit special characteristics of the Serbo-Croatian language in 
order to test Grodzinsky's challenging hypothesis that distinctions within the same closed-class 
category should be lost in agrammatism. 

Subjects 

The subjects, who ranged in age from 30-57 years, were six nonfluent aphasics, two females, 
and four males, all right handed. Their characteristics are summarized in Table 1. In each case, 
the lesion was confined to the left hemisphere. All had completed at least secondary school. All 
were outpatients of the Clinic for Neurophysiology and Speech Pathology in Belgrade, Yugoslavia. 

Four patients carried the diagnosis of stroke, one was a victim of traumatic insult, and 
one had a surgically removed tumor. Time since onset of the disorder was at least six months. 
Three patients (B.S., D.M., and R.N.) were initially mute. CT scans, availabi for the trauma 
patient, the tumor patient, and one stroke patient, revealed a lesion predominantly in the inferior 
posterior region of the left frontal lobe. In addition to the general neurological examination, 
diagnostic criteria included performance on the Boston Diagnostic Aphasia Examination, which 
was translated and adapted for Serbo-Croat speakers. Comprehension v. relatively good in 
social contexts, but, as may be seen from the BDAE scores (Table 2), each subject had significant 
impairment in comprehension. 
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Table 1 



Characteristics of the Aphasic Subjects 













Time post- 


Subject 


Age 


Sex 


Education 


Etiology 


onset 


B. S. 


31 


M 


14 


Trauma 


3 Years 


D. K. 


36 


M 


14 


CVA 


6 Months 


D. M. 


33 


F 


16 


Tumor 


4 Years 


R. N. 


57 


F 


16 


CVA 


3 Years 


S. P, 


53 


M 


16 


CVA 


5 Years 


N. M 


57 


M 


16 


CVA 


1 Year 



In their speech production, all the subjects demonstrated severe-to-moderate agrammatic 
speech. That is, their speech was effortful, dysprosodic, and telegraphic. Each of the patients 
made notable production errors on case endings, often using the nominative case in linguistic 
contexts in which this case was inappropriate. However, none of these errors resulted in nonwords. 
Represertative examples of speech production are given in Table 2. 

A group of normal subjects, matched in age and education was also included in the experi- 
ment. 

Materials 

The experimental it - ' erials consisted of 64 grammatical and 64 ungrammatical sentences, 
each containing three words (noun-verb-noun). Half of the grammatical sentences incorporated a 
transitive verb followed by the accusative object noun and half incorporated an intransitive verb 
followed by an adverbial noun, usually in the instrumental case. All the words in the sentence 
were balanced for length and frequency of occurrence. By varying transitivity, four forms of each 
sentence were generated as shown in these examples: 

1) Seljak obradjuje polje. 

(The farmer is cultivating the field.) 

2) Seljak trci poljem. 

(The farmer is running through the field.) 

*3) Seljak obradjuje poljem. 

(The farmer is activating through the field.) 

*4) Seljak trci polje. 

(The farmer is running the field.) 
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Table 2 

Performance on Selected Portions of the BDAE 

Subject BDAE Comprehension Speech Production* 

A BCD 

B. S. 16/20 4/15 9/12 8/10 Pa., mnma brise tanjir. De...decko..kolaci.. 

devojcica uzmi uz.. uzima.. Voda curi... 
Well., mama is drying the plate. The b..boy 
...cookies. ..the girl take ta...is talking... 
The water is leaking. 

D- K. 20/20 5/15 6/12 6/10 Voda. Devojcica. Sudove pere. Voda tece. Dete 

i devojcica se...Ne znam da kazem. Kolace. 
Devojcica se oklize i pala. Gotovo. 
The water. The girl. Washing the dishes. 
Water is leaking. The kid and the girl. ..I 
don't know how to say. Cookies. The girl is 
slipping and fallen. End. 

D. M. 18.5/20 12/15 8/12 7/10 Brat i sestra. Hoce kolace. Mama pere. 

Ta.. tanjir. Voda ... Ne mogu da kazem. 
Brother and the sister. Want the cookies. 
Mama is washing. Th^ pla.. .plate. Water. 
I can not say. 

R- N 19/20 10/15 6/12 2/10 Mama i tata...ne, brise sudove. Ne inogu da 

kazem. Ne inogu da kazem. Vidi kako ovde 
drzi...Ne mogu da kazem. 
Mama and daddy,. .no, drying dishes. I can 
not say. I can not say. Look how is holding.. 
I can not say. 

S. P- 18.5/20 14/15 6/12 7/10 Kujna. Mama pere. ..ovaj tanjir. A ovaj 

decak i devojcica. To jc.Daje sestri kolace. 
Ova je voda pri..pri..E, voda je pr-li-la. 
Voda je prelila u sudoperu. Solja. 
Kitchen. Mama is washing. ..this plate, 
boy and girl. This is. .Is giving cookies to 
the sister This water. ..is li. .li..>. Water is 
lea-king Water is leaking into the sink 
The cup. 

N. M. 12.5/20 V15 6 / 12 6 / 10 Ovaj.. .dete je ustalo da pojede pekmez a ova 

zena je prosula vodu sto je htela da pere Pa 
je sve oprala. 

This, child got up to eat the jam and tli»c 
woman has spilled the water cause she wanted 
to wash. She washed everything. 

A - Body Part Identification D - Reading Sentences and Paragraphs 

B - Commands. ^Patient's description of "Cookie theft" picture, BDAE 

C - Complex ideational Material 
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It will be noted that in each of the above sentences, the correct grammatical form depends on just 
the last (unstressed) phoneme of the last word in the sentence. It should be noted also that some 
of the critical nouns preserve their lexical well-formedness even when they appear in unmarked 
forms (i.e., nominative and accusative). 

Design and Procedure 

The sentences were tape recorded and systematically distributed in four groups. Each sen- 
tence was read once, with normal speed and intonation. Ungrammatical sentences were read with 
the intonation appropriate for the corresponding grammatical sentence (i.e., with a correct case 
inflection). The subjects listened to the sentences over headphones. Their task was to indicate for 
each sentence whether it was grammatically correct or not. The subjects responded by pressing 
one of two keys, marked YES and NO. Each subject participated in four individual sessions, one 
session per week for four consecutive weeks. Each new session started with ten practice sentences 
to familiarize the subject with the task. 

Results 

First, we present an ar alysis of the error data by subject. Percent of errors for each sentence 
type is given in Table 3. 



Table 3 

Percent of Errors for Aphasic Subjects by Sentence Type 



Grammatical Ungrammatical 
sentences sentences 





Transitive 


Intransitive 


Transitive 


Intransitive 


Subject 


verb 


verb 


verb 


verb 


B.S. 


10.8 


14.0 


16.0 


24.0 


D.K. 


4.0 


6.0 


4.0 


10.8 


D.M. 


0.0 


8.0 


12.0 


6.4 


R.N. 


10.8 


10.8 


9.2 


14.0 


S.P. 


6.0 


9.2 


14.0 


16.0 


N.M. 


1.6 


4.0 


8.0 


10.8 


Mean 


5.5 


8.7 


'0.5 


13.7 



The table shows that the error percentage scores of the individual subjects co- varied with the 
severity of their aphasia, as measured by neurologists' ratings. It is important to note, however, 
that all of the subjects were well above chance level in responding correctly to the inflections of 
the terminal word in the target >ei. ices. The same pattern of errors is apparent for all subjects 
despite differences in etiology, ago, and severity. 
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Also shown in Table 3 is the analysis of errors by senterce type. Sentences of Type 4 evoked 
the most errors (i.e., grammatically incorrect sentences with an intransitive verb), and those of 
Type i evoked the fewest errors (i.e., grammatically correct sentences with a transitive verb). 

The error data were subjecv-d to analysis of variance by subjects and by items, comparing 
tl • factors of group, grammaticaiity, and transitivity. In both the analyses by subjects and by 
items there was a significant effect of grammaticaiity, Fl(l,5) = 16.74,/? < 0.01;F2(1,31) = 
8.73, p < 0.01. This means that grammatically correct and grammatically incorrect sentences 
were not equally difficult for che subjects. It proved to be easier for the subjects to give a correct 
judgment when the correct inflections were presented. 

Analysis of the false alarms indicates that this effect is not due to the tendency for Broca-type 
aphasics to be "over-accepting." The fact that they correctly rejected ungrammatical sentences 
88% of the time is clear evidence of their retained sensitivity to the closed-class morphology, both 
in accepting grammatically correct sentences ?ind in rejecting ungrammatical sentences. 

The effect of transitivity was also significant both by subjects and by items: Fl(l,5) = 
10.00,j> < 0.025; F2(l, 31) = 7.41,/? < 0.01. This indicates that these agrainmatic subjects were 
sensitive to subcategorization requirements that, as we saw, require them to attend to noun 
inflections. We interpret this result to mean that Broca-type aphasics have preserved information 
in their lexicons about the complements of verbs, retaining whether or not they obligatorily 
require a direct object. Presumably, such stored information serves to "prime" the correct noun 
inflections by generating a syntactic expectancy for a particular case ending. 

A comparison of the accuracy of judgments by aphasic patients with those of control subjects 
demonstrated that although the patient's performance was relatively successful, it was clearly de- 
pressed compared to the nearly error-free performance of control subjects. Detection of ungram- 
matical sentences occurred with an average accuracy of 99.2%, whereas grammatical sentences 
were correctly identified 98.6% of the time. 

An interesting post-hoc observation was made concerning the lexical items that preserve 
their lexical well-formedness even in the unmarked form. It should be noted that for some nouns 
the unmarked nominative case is identical to the word stem. For these nouns the other case- 
inflections are simply appended to the stem (as in Table 4, Column 1). These nouns keep their 
lexical well-formedness even when the case inflections are neglected. For all other nouns (as in 
Column 2), the nominative form and other case forms are different from the word stem. For the 
latter class of nouns, the stem needs a case inflection in order to be a word. 

Grodzinsky (1984) has proposed that agranimatirs should have difficulty processing inflec- 
tions of the first class of nouns but not the second class. In the case at hand, this hypothesis 
would predict that aphasics should make more mistakes when they are processing a sentence m 
which the critical noun-item belongs to the first class of nouns (nouns in the unmarked nomi- 
native case). For example, an aphasic subject should reject a grammatically correct sentence in 
which the critic ^1 noun is inflected with some case other than the nominative or accusative case. 
This would happen if the subjects were capable of piocessing the noun only by treating it as if it 
were in the nominative (unmarked) case. On the other hand, whenever the critical noun is in the 
nominative case, a subject should have a tendency to accept the sentence even if ungrammatical. 
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Table 4 



Case Inflections for Two Classes of Nouns in Serb-Croatian 



Class 1 



Class 2 



Nominative 



sto- (table) 
sto-la 



farb-a (color) 
farb-e 



Genitive 



Dative 



sto-lu 



farb-i 



Accusative 



sto- 



farb-u 



Instrumental 



sto-lom 



farb-om 



However, it was found that nouns in the marked cases were not significantly more difficult than 
those in ihe unmarked case for our subjects. This finding disconfirms Grod; insky's prediction. 



The main result was that agrammatics in this study proved to be capable of using bound 
closed-class morphemes in sentence processing. Each of the six Serbo-Croat-speaking agrammatic 
patients showed evidence of retained ability to respond selectively to noun inflections marking 
noun case and verb transitivity. The finding of retained syntactic competence is consistent with 
earlier findings of Smith and Mimica (1984) in Serbo-Croatian and of Heeschen (1980) in German. 

The findings are also consistent with recent work with English-speaking agrammatics who 
showed a retained ability to perform judgments of sentence grammaticality (Linebarger et al., 
1983). Further, the results are consistent with the indications that agrammatic aphasics are 
capable of carrying out syntactic analyses on line (Crain et al., 1984; Tyler, 1985). The subjects 
of the present study, like their English-speaking counterparts, demonstrated retained sensitivity 
to syntactic category even when the category is marked by affixation and n^t by word order or 
by free-standing grammatical morphemes. 

As noted in the introduction, this result could not have been presupposed. It is conceivable 
that agrammatics would be capable of exploiting one indicator ;>f syntactic category, but not 
another. Given the indications that agrammatics are deficient in use of closed-class vocabulary 
items, one might be led to suppose that some ability to use word order is retained while ability to 
use the closed class morphology is lost. The structure of English does not permit us to distinguish 
between these possibilities, because word order and the introduction of prepositions such as tc 
and by are the only devices available for marking noun case. But, as we noted, the Serbo-Croatian 
language, by virtue of its rich inflectional system, enables us to test the effect of relying solely on 
inflectional morphemes for marking case. Both transitive and intransitive verbs can be directly 
followed by a noun phrase. This feature of Serbo-Croatian made possible the creation of transitive 
and intransitive sentences that are minimal pairs, differing solely in one noun suffix. 
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In summary, our Serbo-Croat speaking agrammatic patients showed retained sensitivity to 
noun inflections marking the transitive/intransitive verb within the context of th? sentence judg- 
ment task. The error rate was remarkably low, averaging only 12% for aphasic subjects across 
conditions. The finding of preserved sensitivity to case in this context clearly fails to support 
Grodzinsky's (1984) hypothesis that distinctions within the same closed-class category should be 
lost in agrammatism. 

The questions we raised about the ability of agrammatics to use closed class morphology in 
sentence processing were also addressed in the recent research of Smith ?.nd Miniica (1984), to 
which we have referred. In that study, also, Serbo-Croatian-speaking agrammatics showed better 
than chance ability to use case inflection in the assignment of agent-object relations, but the error 
rate was much higher than in the present study. The large differences in rate of correct responses 
may be attributable to the task. Smith and Mimica used an object manipulation tas 1 , 'vMch is 
known to impose a considerable burden on short-term memory processes (Hamburger & Crain, 
1984). 

An explanation of agrammatics' performance failures in terms of processing limitations rather 
than loss of syntactic competence is fully in line with the other findings of the Smith and Mimica 
study. These investigators explored the effects on the acting out task of association or dissoci- 
ation of three variables: word order, animacy, and case inflection. When all three factors were 
concordant, Broca's aphasics performed with only 10% of error, whereas when two factors were 
in competition, performance fell to near chance level (42% errors). In their terms, situations of 
conflict, such as that created by use of a nonpreferred word order, create "cognitive overload." 

Taken together, the findings of the present study are consistent with other research both 
on richly inflected languages and on fixed word order languages like English. The weight of (he 
evidence supports the view that comprehension deficits in agrammatism do not reflect loss of 
either the knowledge or ability to access members of the closed-class lexicon in extracting the 
syntactic structure of a sentence. Access to grammatical knowledge is impaired, to be sure, but 
access can often be attained successfully in circumstances that minimize processing load. 

A comparison of agrammatics performance across tasks shows that subjects who standardly 
fail in an object manipulation task may succeed in a grammatically judgment task that tests 
comprehension of the same linguistic structures. This implies that all necessary syntactic struc- 
tures may be preserved in the so-called agrammatism of Brora's aphasia and that problems in 
some other part of the language apparatus «are responsible for failures of comprehension. There 
is increasing support for the view that complex behaviors are products of an interaction between 
many different and independent subsystems, et.:h performing a unique and special role. In agram- 
matism, a likely source of comprehension problems is a verbal working memory limitation. There 
is evidence that the phonological processing system on which the verbal working memory depends 
is often damaged in the nonfluent aphasias (Caraniazza, Berndt, & Basili, 1983; Martin, 1985). 

In sum, the findings of the present study are consistent with the main body of research on 
sentence processing in Broca's aphasia in suggesting that the link between linguistic competence 
and linguistic performance is not fully preserved. Tacit knowledge of syntax is seen to be intact 
under circumstances that tax working memory as little as possible. However, linguistic knowledge 
is less accessible in contexts, including many everyday contexts, that place heavy demands on 
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working memory. Thus, the data we have reviewed implicate disturbances of language subsys- 
tems other than the syntactic component and suggest that studies investigating the role of such 
processing components as working memory will be ; mportant in the future. 
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INTENTIONALITY: A PROBLEM OF MULTIPLE REFERENCE 
FRAMES, SPECIFICATIONAL INFORMATION, AND EXTRAOR- 
DINARY BOUNDARY CONDITIONS ON NATURAL LAW* 

M. T. Turveyj 

It is refreshing to see a scholar who is largely sympathetic to the so-called information process- 
ing or representational/computational approach to cognitive systems recognizing its fundamental 
inadequacies. To be blunt, that approach fails to come to terms with either information or inten- 
tionality. Sayre's response to these inadequacies, however, keeps close to the received view. He 
assumes that a biologically and psychologically relevant sense of information can be provided by 
the mathematical theory of communication; he assumes that intentionality amounts to represen- 
tation. These assumptions are bolstered by the closely cognate beliefs that intentionality is to be 
ascribed to some roughly midway-state in the classical afferent-efferent link and that there is a 
metamorphosis from meaningless states to meaningful states. To his credit, Sayre aspires to make 
the representations genuine. He vants them to stand for real things. He wants the transition from 
meaningless sensory states to meaningful perceptual states to be (mathematically) principled. 

From my perspective as a proponent of the ecological approach to perceiving- acting (see Gib- 
son, 1979; Turvey, Shaw, Reed, & Mace, 1981), Sayre's rentiments are right but his premises art 
wrong. Nor surprisingly, I find his treatment of intcntionality disappointing. I concur with Sayre's 
implicit wish for a concerted effort to naturalize (my word) intcntionality, but my preference is to 
keep Hie deliberations very close to natural science and the search for lawful regularities. Sayre 
is quite right in his assessment that an attempt to devise an explanation of intcntionality in the 
Turing reductionism/token physicalism perspective of cognitive science (which denigrates inten- 
tionality to the states of a computational device) does not have a "ghost of a chance" (Carello, 
Turvey, Kugler, & Shaw, 1984; Turvey et al., 1981). But he is quite wrong, I believe, in suggesting 
that pursuing the purer equation of intentionality with representation (relieved of computational 
procedures) can fare any better. 

Intentionality is directedness toward objects. Locomoting terrestrial animals, including hu- 
mans, direct themselves through openings and around barriers. They direct their limbs in certain 
ways with respect to a brink in a surface— directing then- one way if the brink is where they can 
step down and another way if it must be negotiated by jumping. Gibson (1966, 1979; Reed & 

* The Behavwral and Brain Science, 9, 153-1%, 1986. Commentary on Sayre, K. M. M98C, 
Intentionality and informati- n processing: An alternative model for cognitive science. The Be- 
havioral and Brain Sciences, 9, 121-138. 

| Also University of Connecticut. 
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Jones, 1982) advocated mutually constraining theories of animals and environments (sec Alley, 
in press; Mace, 1977; Michaels & Carello, 1981) as the basis for an understanding of perceiving- 
acting that addressed such mundane intentional behavior. (This central thesis of the ecological 
approach, the duality of animal and environment [Shaw & Turvey, 1982], implies that efforts to 
ground intentionality only in "environmental constraints" will miss the mark. Duality, by the 
way, is not dualism.) Gibson pursued a perceptual theory that was fundamentally intentional 
rathe than one that is made intentional as an afterthought. With considerable care he identified 
how an understanding of intentionality of perceiving posts challenges for science on several fronts, 
and how these challenges might he met. I will describe two of them. 

The first challenge r A o describe the layout of surfaces with reference to the animal. This 
move is continuous with the larger lesson of relativity theory: All state descriptions are frame 
dependent. Reference frames are substantial and are not to be confused with the coordinate 
systems that abstractly represent them. The properties of an animal to which surface layout 
must be referred are basically the animal's magnitudes, its morphology, its metabolism. With 
regard to a brink, the separation of surfaces is in reference to limb mag Mtud*s. Obviously a given 
brink can be referred to multiple, equally real frames. One frame is the terrestrial frame with 
distances and durations measured in arbitrary units. This frame is useful to the physicist but it is, 
by definition, animal-neutral. (In the received view it is mistakenly adopted as the zole objective 
frame.) Other frames are individual animals. Consequently, the same brink in the terrestrial 
frame is a place negotiable by leg extension in the frame provided by one (larger) animal not 
negotiable in this fashion in the frame provided by another (smaller) animal. 

A second challenge is to describe how ammals can be informed about these frame-dependent 
environmental properties (affordances) to which their activities are directed. There are two senses 
in which the term information is used (cf. Turvey & Kugler, 1984). In the indicational/injunctional 
sense information consists of symbol strings identifying states of affairs ("the situation is so-and- 
so" 1 ) or things to be done ("do so-and-so now"). Information in this sense is underconstraining, 
like a stop sign. The other sense is the specificatioual sense of Gibson (1979). In the case of 
vision, information is optical structure lawfully generated by facts — properties of surface layout, 
properties of an animal's movements. This structure does not resemble the facts; rather it is 
specific to them. The ecological argument is that information in the specificatioual sense meets 
the above challenge. I will give some examples shortly but I wish to pieface them by noting 
what's at issue in the contrast between the two senses of information. 

The indicational/inj uictional sense, I believe, fits neatly into a tradition that takes the 
primary perceptual activity to be discriminating among members of a set and the equilibrium 
thermodynamics of closed systems as the branch of physics to which discussions of information 
can be meaningfully referred. In such a system the states are enumerable from the outset. To 
put it very roughly, the information notion only has to address their individual probabilities, 
thereby providing a basis for discriminating among them. Living things, however, are open 
systems. The animal-environment system, in which an animal participates as one of the two 
mutually tailored components, is open. Significantly, the stales of an open system need not be 
fixed at the outset. Given fluctuations in the niicrostructure and non linearities, a scaling up in 
one or more variables discontinuously decreases an open system's symmetry. More constraints 
arise. The system bccu.es mor* ordered. New states come into existence. Consequently, the 
order principle and complexions of Boltzman, and the notion of information that they sustain 



o 2-^4 
ERIC 



f 



Intentionality 201 

are of limited applicability to open physical systems (e.g., Prigogine, 1980), including animal- 
environment systems. 

Open (evolving, developing) systems moth ate a different notion of information from closed 
systems (Kugler, Kelso, & Turvey, 1982; Kuglcr & Turvey, in press), Sayre makes an offhand 
remark about the information in the genes and in the phenotype. Efforts to apply classical 
information theoretic notions to the genotype-phenotype link, conceived as a communication 
channel, have largely been dismissed. In intuitive terms, the dismissal is based upon a feeling 
that an information metric should recognize the greater complexity of the full-fledged animal 
(Waddington, 1968). Even where the open-closed distinction is sidestepped, as in Pattee's (1973, 
1977) thoroughgoing and celebrated efforts to detail the problem of a physical interpretation of 
"genetic information," the conceptions of the mathematical theory of communication have proven 
to be of little value. 

The specificational sense of information is consistent ^ ith the perspective that takes per- 
ceiving the persisting and changing properties of a thing as primary. For Gibson (1966, 1979) 
the fundamental question is how to characterize the information that supports the perceiving of 
P; the question of how to characterise the information that supports distinquishing P from Q, 
R, and so on is secondary and derivative. Suppose that P is the anima! itself. In locomoting, a 
terrestrial animal generates forces that displace it relative to the surroundings. There are obvious 
mechanical regularities to be noted. They are ordinarily expressed through Newton's laws. But 
this situation also exhibits nonmechanical regularities expressed by non-Newtonian laws of wide 
(though not universal) scope. For instance, all the densely nested optical solid angles, whose 
bases are the faces and facets of surfaces and whose apex is the point of observation, change con- 
currently. An optical flow field— crudely, a smooth velocity vector field— is generated. The global 
form of the flow, or optical morphology, is specific to the configuration of locomotory forces and to 
the displacements of the animal. Rectilinear forward locomotion, for example, lawfully generates 
a dilating parabolic flow; a dilating parabolic flow specifies rectilinear forward locomotion., 

This simple but significant example of information in the specificational sense permits r ^ to 
make briefly some important points that can be more carefully developed (e.g., Solomon, Carello, 
& Turvey, 1984; Turvey & Carello, 1985, 1986; Turvey et al., 1981). First, optical information 
in the specificational sense is optical structure whose macroscopic, qualitative properties are 
nomically dependent upon and specific to (under natural boundary conditions) properties of the 
animal-environment system. Second, optical information in the specificational sense does not 
reduce io neural signals in the visual system (see below). Thinking about optical information 
as alternative (macroscopic, qualitative) descriptions of the photon light field, structured by the 
layout of material surfaces and defined relative to locations and paths in the transparent medium 
(air for terrestrial animal), is useful. It aids an understanding of optical information independent 
of vision and of the kinds of ocular systems that evolved. Optical information in the specificational 
sense is tied to laws at the ecological scale, laws that relate optical properties to kinetic properties 
(of the animal-environment system). The ecological approach argues that these laws were the basis 
for the evolution of, and are the basis for the everyday realization of, locomotor activity and its 
directedness and intentionality. 

Let's extend the example a little. Dilation of an optical solid angle relative to a point 
of observation specifies the approach of a substantial surface. The inverse of the relative rate 
of dilation, r, specifies when the collision will occur if the current kinetic conditions persist 
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(Lee, 1980). And the rate at which r changes has a critical point property below which it 
specifies that the upcoming collision will be hard (Kugler, Turvey, Carello, & Shaw, 1985; Lee, 
1980). The foregoing are not so much quantities as they are local flowfield morphologies and 
their changes. They specify pending states. They make possible the synchronizing of acts with 
cvents-the prospective control of basic behavior. They are meaningful in a very pragmatic sense 
of the word. Speaking in Dennett's (1983) terms, information in the specificational sense has 
"intentional features." And to echo Gibson's (1966, 1979) longstanding gripe, the "meaningless 
to meaningful" problem with which Say re struggles is not a problem. (Coming to terms with 
the laws at the ecological scale on which the intentionality of perceiving-acting is founded, and 
figuring out how to formulate and systematize them, now that's a problem!) 

Said succinctly, there is a description of optical structure under which its detection guarantees 
the intentionality of perceiving. There are other descriptions of optical structure under which it 
must be translated or processed or interpreted or embellished to mafceperceiving intentional. Sayre 
is playing with one such description. In this respect it is important to note that Gibson (1966, 
1979) avidly denied that optical information in the specificational sense was the sort of thing that 
could be "processed." It is bizarre, therefore, for Sayre to claim that Marr (1982) is on target 
with his criticism that Gibson underestimated the complexity of visual information processing. 
There is a c ash of metaphors here. Marr and Sayre are operating in the orthodox metaphor 
of the nervous system as an efficient cause; for example, it produces percepts. Gibson (1966) 
sees the nervous system as functioning vicariously in perceiving. It is a part (albeu extremely 
rich) of the supportive basis for the expression of natural cum ecological laws (cf. Ben-Zeev, 
1984). An understanding of the nervous system's role in vision in the support metaphor will 
^e radically different from the processing/producing understanding subscribed to by Marr and 
Sayre (Kugler & Turvey, in press). At all events, in the ecological view, optical descriptions that 
invoke proces ung to render intentionless inputs into intentional percepts are of the wrong kind. 
They beg too many questions and they cast intentionality as a derivative rather than a primary 
phenomenon. 

The last sentences, of course, are just another way of saying that intentionality should not be 
reduced to representation. As I remarked above, Sayre's goal of disengaging intentionality from 
computation^ procedures is admirable; his insistence on the intentional- representational equation 
is not. That equation, as I have been trying to stress, diverts us from addressing intentionality in 
a way that reveals its position in the natural order of things. Consider the following. What are 
customarily referred to as an animal's or person's intentional contents (cf. Dennett, 1969; Searle, 
1983) constitute extraordinary boundary conditions on natural law (especially those laws that 
are particularly pertinent to the ecological scale). A flying animal aiming to collide gently with a 
surface will synchronize its deceleration with one value of r; an acceleration to produce a timely, 
violent collision will be generated with respect to another value of r (e.g., Lee & Reddish, 1981; 
Lee, Younj;, Reddish, Lough, & Clayton, 1984; Wagner, 1982). In these simple examples the final 
conditions- -the animal's intentional content-specify the initial conditions that a law (relating 
optical properties to kinetic conditions) must assume. Examples like this abound, and one of 
them has been investigated quite thoroughly (Kugler Sz Turvey, in press). They suggest a pro- 
found challenge for naturalizing intentionality: understanding the principles by which intentional 
contents harness natural laws. 
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